Decision Support, Guidelines and Protocols

From Decision to Shared-Decision: Introducing Patients’ Preferences in Clinical Decision Analysis - A Case Study in Thromboembolic Risk Prevention

In the context of the EU project MobiGuide, the development of a patient-centric decision support system based on clinical guidelines is the main focus. The project is addressed to patients with chronic illnesses, including atrial fibrillation (AF). In this paper we describe a shared-decision model framework to address those situations, described in the guideline, where the lack of hard evidence makes it important for the care provider to share the decision with the patient and/or his relatives. To illustrate this subject we focus on an important subject tackled in the AF guideline: thromboembolic risk prevention. We introduce a utility model and a cost model to collect patient’s preferences. On the basis of these preferences and of literature data, a decision model is implemented to compare different therapeutic options. The development of this framework increases the involvement of patients in the process of care focusing on the centrality of individual subjects.

Lucia Sacchi, Carla Rognoni, Stefania Rubrichi, Silvia Panzarasa, Silvana Quaglini

Model-Based Combination of Treatments for the Management of Chronic Comorbid Patients

The prevalence of chronic diseases is growing year after year. This implies that health care systems must deal with an increasing number of patients with several simultaneous pathologies (i.e., comorbid patients), which involves interventions combining primary, specialist, and hospital cares. Clinical practice guidelines provide evidence-based information on these interventions, but only on individual pathologies. This sets up the urgent need of developing ways of merging multiple single-disease interventions to provide professional assistance to comorbid patients. Here, we propose an integrated care model formalizing the treatment of chronic comorbid patients across primary, specialist and hospital cares. The model establishes the baseline of a divide-and-conquer approach to the complex task of multiple therapy combination that was tested on the comorbidity of hypertension and chronic heart failure.

David Riaño, Antoni Collado

Using Constraint Logic Programming to Implement Iterative Actions and Numerical Measures during Mitigation of Concurrently Applied Clinical Practice Guidelines

There is a pressing need in clinical practice to mitigate (identify and address) adverse interactions that occur when a comorbid patient is managed according to multiple concurrently applied disease-specific clinical practice guidelines (CPGs). In our previous work we described an automatic algorithm for mitigating pairs of CPGs. The algorithm constructs logical models of processed CPGs and employs constraint logic programming to solve them. However, the original algorithm was unable to handle two important issues frequently occurring in CPGs – iterative actions forming a cycle and numerical measurements. Dealing with these two issues in practice relies on a physician’s knowledge and the manual analysis of CPGs. Yet for guidelines to be considered stand-alone and an easy to use clinical decision support tool this process needs to be automated. In this paper we take an additional step towards building such a tool by extending the original mitigation algorithm to handle cycles and numerical measurements present in CPGs.

Martin Michalowski, Szymon Wilk, Wojtek Michalowski, Di Lin, Ken Farion, Subhra Mohapatra

A Multi-agent Planning Approach for the Generation of Personalized Treatment Plans of Comorbid Patients

This work addresses the generation of a personalized treatment plan from multiple clinical guidelines, for a patient with multiple diseases (comorbid patient), as a multi-agent cooperative planning process that provides support to collaborative medical decision-making. The proposal is based on a multi-agent planning architecture in which each agent is capable of (1) planning a personalized treatment from a temporal Hierarchical Task Network (HTN) representation of a single-disease guideline, and (2) coordinating with other planning agents by both sharing disease specific knowledge, and resolving the eventual conflicts that may arise when conciliating different guidelines by merging single-disease treatment plans. The architecture follows a life cycle that starting from a common specification of the main high-level steps of a treatment for a given comorbid patient, results in a detailed treatment plan without harmful interactions among the single-disease personalized treatments.

Inmaculada Sánchez-Garzón, Juan Fdez-Olivares, Eva Onaindía, Gonzalo Milla, Jaume Jordán, Pablo Castejón

Merging Disease-Specific Clinical Guidelines to Handle Comorbidities in a Clinical Decision Support Setting

From a clinical decision support perspective the treatment of co-morbid diseases is a challenge since it demands the coordination between the disease-specific therapeutic plans of the co-morbid diseases. Although clinical guidelines provide clinical recommendations, they focus on a single disease and for comorbid disease management there is a requirement to have multiple concurrently active clinical guidelines. Merging computerized clinical practice guidelines (CPG) related to comorbidities and using them in clinical decision support systems is a potential solution to manage comorbidities in a clinical decision support system. We have developed a CPG merging framework to merge computerized CPG. The central aspect of our framework is a merge representation ontology that captures the merging criteria to achieve the merging of multiple CPG whilst satisfying medical, workflow, institutional and temporal constraints. We have used our framework successfully to create therapy plans for patients treated for Atrial Fibrillation and Chronic Heart Failure comorbidity.

Borna Jafarpour, Syed Sibte Raza Abidi

Multiparty Argumentation Game for Consensual Expansion Applied to Evidence Based Medicine

Evidence based medicine (EBM) requires many different sources of knowledge when dealing with complex patients. Such a discipline inherently involves the issue of conflicts arising amongst arguments coming from different sources, such as guidelines, trials and clinical studies. In this paper we consider a set of agents with their own medical argumentation which exchange medical arguments to enrich their own knowledge and suggest a set of treatments resulting from the argumentation process.

Stefano Bromuri, Maxime Morge

Semantic Technology I

Rule-Based Formalization of Eligibility Criteria for Clinical Trials

In this paper, we propose a rule-based formalization of eligibility criteria for clinical trials. The rule-based formalization is implemented by using the logic programming language Prolog. Compared with existing formalizations such as pattern-based and script-based languages, the rule-based formalization has the advantages of being declarative, expressive, reusable and easy to maintain. Our rule-based formalization is based on a general framework for eligibility criteria containing three types of knowledge: (1) trial-specific knowledge, (2) domain-specific knowledge and (3) common knowledge. This framework enables the reuse of several parts of the formalization of eligibility criteria. We have implemented the proposed rule-based formalization in SemanticCT, a semantically-enabled system for clinical trials, showing the feasibility of using our rule-based formalization of eligibility criteria for supporting patient recruitment in clinical trial systems.

Zhisheng Huang, Annette ten Teije, Frank van Harmelen

Characterizing Health-Related Information Needs of Domain Experts

In information retrieval literature, understanding the users’ intents behind the queries is critically important to gain a better insight of how to select relevant results. While many studies investigated how users in general carry out exploratory health searches in digital environments, a few focused on how are the queries formulated, specifically by domain expert users. This study intends to fill this gap by studying 173 health expert queries issued from 3 medical information retrieval tasks within 2 different evaluation compaigns. A statistical analysis has been carried out to study both variation and correlation of health-query attributes such as length, clarity and specificity of either clinical or non clinical queries. The knowledge gained from the study has an immediate impact on the design of future health information seeking systems.

Eya Znaidi, Lynda Tamine, Cecile Chouquet, Chiraz Latiri

Comparison of Clustering Approaches through Their Application to Pharmacovigilance Terms

In different applications (

i.e.

, information retrieval, filtering or analysis), it is useful to detect similar terms and to provide the possibility to use them jointly. Clustering of terms is one of the methods which can be exploited for this. In our study, we propose to test three methods dedicated to the clustering of terms (hierarchical ascendant classification,

Radius

and maximum), to combine them with the semantic distance algorithms and to compare them through the results they provide when applied to terms from the pharmacovigilance area. The comparison indicates that the non disjoint clustering (

Radius

and maximum) outperform the disjoint clusters by 10 to up to 20 points in all the experiments.

Marie Dupuch, Christopher Engström, Sergei Silvestrov, Thierry Hamon, Natalia Grabar

Trusting Intensive Care Unit (ICU) Medical Data: A Semantic Web Approach

The Intensive Care Unit (ICU) domain generates large volumes of patient data which can be used in medical research. However, inaccuracies often exist in this data and due to the data’s size and domain complexity, automated approaches are required to associate a level of quality and trust with the data. We describe a computational framework to perform such assessments based on semantic web technologies. Linked data enables integration with other datasets, which can be used along with details of the data’s provenance and medical domain knowledge from appropriate ontologies. We have successfully applied the framework to two types of ICUs: general medical and traumatic brain injury.

Laura Moss, David Corsar, Ian Piper, John Kinsella

Learning Formal Definitions for Snomed CT from Text

Snomed CT

is a widely used medical ontology which is formally expressed in a fragment of the Description Logic

$\mathcal{EL}\text{++}$

. The underlying logics allow for expressive querying, yet make it costly to maintain and extend the ontology. In this paper we present an approach for the extraction of

Snomed CT

definitions from natural language text. We test and evaluate the approach using two types of texts.

Yue Ma, Felix Distel

Towards Automatic Patient Eligibility Assessment: From Free-Text Criteria to Queries

The presented work contributes to bridging the representation of clinical trials and patient data. Our ultimate goal is to support the trial recruitment, by automating the process of formalizing eligibility criteria of clinical trials, starting from free text of criteria and leading to a computable representation. This paper discusses the final step in the pipeline i.e. generating queries from the structured representation consisting of detected patterns and semantic entities. The queries allow to evaluate patient eligibility for a given trial. To enable easy incorporation of semantic reasoning using medical ontologies, we built the queries in SPARQL and use the OWL representation of one the standards for patient data storage - openEHR archetypes and NCI ontology. The available public repository of archetypes and the expressivity of SPARQL allow to create template queries for the majority of patterns.

Krystyna Milian, Annette ten Teije

Bioinformatics

Biomedical Knowledge Extraction Using Fuzzy Differential Profiles and Semantic Ranking

Recently, technologies such as DNA microarrays allow to generate big scale of transcriptomic data used to the aim of exploring background of genes. The analysis and the interpretation of such data requires important databases and efficient mining methods, in order to extract specific biological functions belonging to a group of genes of an expression profile. To this aim, we propose here a new approach for mining transcriptomic data combining domain knowledge and classification methods. Firstly, we propose the definition of Fuzzy Differential Gene Expression Profiles (FG-DEP) based on fuzzy classification and a differential definition between the considered biological situations. Secondly, we will use our previously defined efficient semantic similarity measure (called IntelliGO), that is applied on Gene Ontology (GO) annotation terms, for computing semantic and functional similarities between genes of resulting FG-DEP and well known genetic markers involved in the development of cancers. After that, the similarity matrices will be used to introduce a novel Functional Spectral Representation (FSR) calculated through a semantic ranking of genes regarding their similarities with the tumoral markers. The FSR representation should help expert to interpret by a new way transcriptomic data and infer new genes having similar biological functions regarding well known diseases.

Availability: The semantic similarity measure and the ranking method are available at

http://plateforme-mbi.loria.fr/intelligo/ranking.

php

.

Sidahmed Benabderrahmane

Knowledge-Based Identification of Multicomponent Therapies

In recent years, several approaches have been proposed to improve the capacity of pharmaceutical research to support personalized care. An approach that takes advantages of the large amount of biological knowledge continuously collected in different repositories could improve the drug discovery process. In this context, networks are increasingly used as universal platforms to integrate the knowledge available on a complex disease. The objective of this work is to provide a knowledge-based strategy to support polypharmacology, a new promising approach for drug discovery. Given a specific disease, the proposed method is able to identify the possible targets by analysing the topological features of the related network. The network-based analysis defines a score aimed at ranking the targets and selecting their best combinations. The results obtained on Type 2 Diabetes Mellitus highlight the ability of the method to retrieve novel target candidates related to the considered disease.

Francesca Vitali, Francesca Mulas, Pietro Marini, Riccardo Bellazzi

Enhancing Random Forests Performance in Microarray Data Classification

Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies.

Nicoletta Dessì, Gabriele Milia, Barbara Pes

Copy–Number Alterations for Tumor Progression Inference

Copy–number alterations (CNAs) represent an important component of genetic variations and play a significant role in many human diseases. Such alterations are related to certain types of cancers, including those of the pancreas, colon, and breast, among others. CNAs have been used as biomarkers for cancer prognosis in multiple studies, but few works report on the relation of CNAs with the disease progression. In this paper, we provide cases where the inference on the disease progression improves when exploiting CNA information. To this aim, a specific dissimilarity-based representation of patients is given. The employed framework outperforms a typical approach where patients are represented through a set of available attribute values. Three datasets were employed to validate the results of our analysis.

Claudia Cava, Italo Zoppis, Manuela Gariboldi, Isabella Castiglioni, Giancarlo Mauri, Marco Antoniotti

Constraining Protein Docking with Coevolution Data for Medical Research

Protein interaction is essential to all biological systems, from the assembly of multimeric complexes to processes such as transport, catalysis and gene regulation. Unfortunately, the prediction of protein-protein interactions is a difficult problem, often with modest success rates, in part because docking algorithms must filter a very large number of possibilities and then attempt to identify a correct model among many incorrect candidates. This paper presents a scoring function to estimate contacts in coevolving proteins, shows how the predicted contacts can constrain the filtering stage and significantly reduce the number of incorrect candidates, and illustrates the application of this method to the docking of two complexes of medical relevance, one involving a chromosome condensation regulator homologous to a protein responsible for retinitis pigmentosa and the other a cyclin-dependent kinase, a likely target for cancer therapy.

Ludwig Krippahl, Fábio Madeira, Pedro Barahona

Machine Learning

Single- and Multi-label Prediction of Burden on Families of Schizophrenia Patients

Whereas there exist questionnaires used to measure the level of anxiety or depression in caregivers of schizophrenia patients, sometimes these symptoms take too long to be detected and the treatment needed is more difficult than it would have been if the burden had been detected at an earlier stage. In this paper we propose the use of automatic classification techniques to predict the output of such questionnaires (Hamilton and ECFOS-II), making it possible to anticipate an appropriate treatment or advice for the family caregivers from Primary Care consultations. In particular, we apply standard (one class variable) and multi-dimensional classification approaches to predict caregiver anxiety, depression and answers to questionnaires. Our study has been carried out with a dataset containing data from 180 schizophrenia patients and their caregivers, and the results are very promising, obtaining accuracies of approximately 96%.

Pablo Bermejo, Marta Lucas, José A. Rodríguez-Montes, Pedro J. Tárraga, Javier Lucas, José A. Gámez, José M. Puerta

Predicting Adverse Drug Events by Analyzing Electronic Patient Records

Diagnosis codes for adverse drug events (ADEs) are sometimes missing from electronic patient records (EPRs). This may not only affect patient safety in the worst case, but also the number of reported ADEs, resulting in incorrect risk estimates of prescribed drugs. Large databases of electronic patient records (EPRs) are potentially valuable sources of information to support the identification of ADEs. This study investigates the use of machine learning for predicting one specific ADE based on information extracted from EPRs, including age, gender, diagnoses and drugs. Several predictive models are developed and evaluated using different learning algorithms and feature sets. The highest observed AUC is 0.87, obtained by the random forest algorithm. The resulting model can be used for screening EPRs that are not, but possibly should be, assigned a diagnosis code for the ADE under consideration. Preliminary results from using the model are presented.

Isak Karlsson, Jing Zhao, Lars Asker, Henrik Boström

Top-Level MeSH Disease Terms Are Not Linearly Separable in Clinical Trial Abstracts

Assessments of the efficacy and safety of medical interventions are based on systematic reviews of clinical trials. Systematic reviewing requires the screening of vast amounts of publications, which is currently done by hand. To reduce the number of publications that are screened manually, we propose the automated classification of publications by disease category using Support Vector Machines. We base our classification on the ontological structure of the (MeSH) by treating all terms as their top-level disease category. Unfortunately the resulting classifier lacks sufficient sensitivity for use by systematic reviewers. We argue that this is partially due to the inseparability of the terminology into the disease categories and discuss how future work could address this problem.

Joël Kuiper, Gert van Valkenhoef

Probabilistic Modelling and Reasoning

Understanding the Co-occurrence of Diseases Using Structure Learning

Multimorbidity, i.e., the presence of multiple diseases within one person, is a significant health-care problem for western societies: diagnosis, prognosis and treatment in the presence of of multiple diseases can be complex due to the various interactions between diseases. To better understand the co-occurrence of diseases, we propose Bayesian network structure learning methods for deriving the interactions between risk factors. In particular, we propose novel measures for structural relationships in the co-occurrence of diseases and identify the critical factors in this interaction. We illustrate these measures in the oncological area for better understanding co-occurrences of malignant tumours.

Martijn Lappenschaar, Arjen Hommersom, Joep Lagro, Peter J. F. Lucas

Online Diagnostic System Based on Bayesian Networks

In this paper we present a general medical diagnostic expert system intended to serve as an educational self-diagnostic tool, openly available through the WWW. The system has been designed as an alternative to the common self-diagnosis practice among the general public of searching the Internet, finding the first disease with some matching symptoms, and treating this as a diagnosis, in contrast with the differential diagnosis offered by our system. We discuss the medical knowledge elicitation process, automated generation of Bayesian network models, and the diagnostic process. The system uses a scalable and efficient distributed reasoning engine based on multiple Bayesian networks. An analysis of over 100,000 diagnostic cases is presented. The cases are analyzed based on population characteristics such as age and gender. The results show the need for medical education and highlight the most common problems in non-emergency medical care.

Adam Zagorecki, Piotr Orzechowski, Katarzyna Hołownia

A Probabilistic Graphical Model for Tuning Cochlear Implants

Severe and profound hearing losses can be treated with cochlear implants (CI). Given that a CI may have up to 150 tunable parameters, adjusting them is a highly complex task. For this reason, we decided to build a decision support system based on a new type of probabilistic graphical model (PGM) that we call tuning networks. Given the results of a set of audiological tests and the current status of the parameter set, the system looks for the set of changes in the parameters of the CI that will lead to the biggest improvement in the user’s hearing ability. Because of the high number of variables involved in the problem we have used an object-oriented approach to build the network. The prototype has been informally evaluated comparing its advice with those of the expert and of a previous decision support system based on deterministic rules. Tuning networks can be used to adjust other electrical or mechanical devices, not only in medicine.

Iñigo Bermejo, Francisco Javier Díez, Paul Govaerts, Bart Vaerenberg

Image and Signal Processing

Semi-supervised Projected Clustering for Classifying GABAergic Interneurons

A systematic classification of neuron types is a critical topic of debate in neuroscience. In this study, we propose a semi-supervised projected clustering algorithm based on finite mixture models and the expectation-maximization (EM) algorithm, that is useful for classifying neuron types. Specifically, we analyzed cortical GABAergic interneurons from different animals and cortical layers. The new algorithm, called SeSProC, is a probabilistic approach for classifying known classes and for discovering possible new groups of interneurons. Basic morphological features containing information about axonal and dendritic arborization sizes and orientations are used to characterize the interneurons. SeSProC also identifies the relevance of each feature and group separately. This article aims to present the methodological approach, reporting results for known classes and possible new groups of interneurons.

Luis Guerra, Ruth Benavides-Piccione, Concha Bielza, Víctor Robles, Javier DeFelipe, Pedro Larrañaga

Cascaded Rank-Based Classifiers for Detecting Clusters of Microcalcifications

A Computer Aided Detection (CAD) system has frequently to deal with a significant skew between positive and negative class. For this reason we propose a solution based on an ensemble of classifiers structured as a “cascade” of dichotomizers where each node is robust to such skew since it is trained by a learning algorithm based on ranking instead of classification error. The proposed approach has been applied to the detection of clusters of microcalcifications in mammograms and has shown good performance in comparison with other methods well suited to deal with unbalanced problems.

Alessandro Bria, Claudio Marrocco, Mario Molinara, Francesco Tortorella

Segmenting Neuroblastoma Tumor Images and Splitting Overlapping Cells Using Shortest Paths between Cell Contour Convex Regions

Neuroblastoma is one of the most fatal paediatric cancers. One of the major prognostic factors for neuroblastoma tumour is the total number of neuroblastic cells. In this paper, we develop a fully automated system for counting the total number of neuroblastic cells within the images derived from Hematoxylin and Eosin stained histological slides by considering the overlapping cells. We finally propose a novel multi-stage cell counting algorithm, in which cellular regions are extracted using an adaptive thresholding technique. Overlapping and single cells are discriminated using morphological differences. We propose a novel cell splitting algorithm to split overlapping cells into single cells using the shortest path between contours of convex regions.

Siamak Tafavogh, Karla Felix Navarro, Daniel R. Catchpoole, Paul J. Kennedy

Classification of Early-Mild Subjects with Parkinson’s Disease by Using Sensor-Based Measures of Posture, Gait, and Transitions

Evaluation of posture, gait, turning, and different kind of transitions, are key components of the clinical evaluation of Parkinson’s disease (PD). The aim of this study is to assess the feasibility of using accelerometers to classify early PD subjects (two evaluations over a 1-year follow-up) with respect to age-matched control subjects. Classifying PD subjects in an early stage would permit to obtain a tool able to follow the progression of the disease from the early phases till the last ones and to evaluate the efficacy of different treatments. Two functional tests were instrumented by a single accelerometer (quiet standing, Timed Up and Go test); such tests carry quantitative information about impairments in posture, gait, and transitions (i.e. Sit-to-Walk, and Walk-to-Sit, Turning). Satisfactory accuracies are obtained in the classification of PD subjects by using an

ad hoc

wrapper feature selection technique.

Luca Palmerini, Sabato Mellone, Guido Avanzolini, Franco Valzania, Lorenzo Chiari

False Positive Reduction in Detector Implantation

The development of a detection system is normally driven to achieve good detection rates. In most cases, a good detection rate involves a number of false positive decisions. However, the false positive rate is ultimately what decides if the detection system is effective or not. Another aspect to consider in automatic detection systems is the time to analyse an image until a decision is made. Viola & Jones proposed a cascade detector that achieves good detection and false positive rates at high speed. Some authors have proposed modifications to the cascade detector in order to improve the detection rate while maintaining the same false positive rate. However, during the implantation of the system we consistently find a large number of false positive detections due to the lack of knowledge about the newly acquired images. In this work, we propose a parallel cascade detector that gradually incorporates these new false positives to achieve an acceptable false positive rate. The second cascade detector is built using the new false positive detection images and the original true positive images during the implantation period. The proposed parallel scheme reduces the false positive rate of the system at roughly the same speed.

Noelia Vállez, Gloria Bueno, Oscar Déniz

Semantic Technology II

Redundant Elements in SNOMED CT Concept Definitions

While redundant elements in SNOMED CT concept definitions are harmless from a logical point of view, they unnecessarily make concept definitions of typically large ontologies such as SNOMED CT hard to construct and to maintain. In this paper, we apply a fully automated method to detect intra-axiom redundancies in SNOMED CT. We systematically analyse the completeness and soundness of the results of our method by examining the identified redundant elements. In absence of a gold standard, we check whether our method identifies concepts that are likely to contain redundant elements because they become equivalent to their stated subsumer when they are replaced by a fully defined concept with the same definition. To evaluate soundness, we remove all identified redundancies, and test whether the logical closure is preserved by comparing the concept hierarchy to the one of the official SNOMED CT distribution. We found that 35,010 of the 296,433 SNOMED CT concepts (12%) contain redundant elements in their definitions, and that the results of our method are sound and complete with respect to our partial evaluation. We recommend to free the stated form from these redundancies. In future, knowledge modellers should be supported by being pointed to newly introduced redundancies.

Kathrin Dentler, Ronald Cornet

Medical Ontology Validation through Question Answering

Medical ontology construction is an interactive process that requires the collaboration of both ICT and medical experts. The complexity of the medical domain and the formal description languages makes this collaboration a time consuming and error-prone task. In this paper, we define an ontology validation method that hides the complexity of the formal description languages behind a question-answering game. The proposed approach differs from ”classic” logical-consistency validation approaches and tackles the validation of the domain conceptualization. Reasoning techniques and verbalization methods are used to transform statements inferred from ontologies into natural language questions. The answers of the domain experts to these questions are used to validate and improve the ontology by identifying where it needs to be modified. The validation system then performs automatically the ontology updates needed to correct the detected errors.

Asma Ben Abacha, Marcos Da Silveira, Cédric Pruski

Lexical Characterization and Analysis of the BioPortal Ontologies

The increasing interest of the biomedical community in ontologies can be exemplified by the availability of hundreds of biomedical ontologies and controlled vocabularies, and by the international recommendations and efforts that suggest ontologies should play a critical role in the achievement of semantic interoperability in healthcare. However, many of the available biomedical ontologies are rich in human understandable labels, but are less rich in machine processable axioms, so their effectiveness for supporting advanced data analysis processes is limited. In this context, developing methods for analysing the labels and deriving axioms from them would contribute to make biomedical ontologies more useful. In fact, our recent work revealed that exploiting the regularities and structure of the labels could contribute to that axiomatic enrichment.

In this paper, we present an approach for analysing and characterising biomedical ontologies from a lexical perspective, that is, by analysing the structure and content of the labels. This study has several goals: (1) characterization of the ontologies by the patterns found in their labels; (2) identifying which ones would be more appropriate for applying enrichment processes based on the labels; (3) inspecting how ontology re-use is being addressed for patterns found in more than one ontology.

Our analysis method has been applied to BioPortal, which is likely to be the most popular repository of biomedical ontologies, containing more than two hundred resources. We have found that there is a high redundancy in the labels of the ontologies; it would be interesting to exploit the content and structure of the labels of many of them and that it seems that re-use is not always performed as it should be.

Manuel Quesada-Martínez, Jesualdo Tomás Fernández-Breis, Robert Stevens

Ontology-Based Reengineering of the SNOMED CT Context Model

SNOMED CT is a terminology system partially built on formalontological principles. Although its on-going redesign efforts increasingly consider principles of formal ontology, SNOMED CT’s top-level categories and relations still reflect the legacy of its predecessors rather than formal ontological principles. This is apparent in its

Context Model

, which blends characteristics of information models with characteristics of ontologies. We propose a reengineering of the SNOMED CT Context Model formulated with ontology design patterns based on the BioTopLite upper ontology. Our analysis yields a clear division between clinical situations in a strict sense and information artefacts that denote clinical situations.

Catalina Martínez-Costa, Stefan Schulz

Using a Cross-Language Approach to Acquire New Mappings between Two Biomedical Terminologies

The exploitation of clinical reports for generating alerts especially relies on the alignment of the dedicated terminologies, i.e., MedDRA (exploited in the pharmacovigilance area) and SNOMED International (exploited recently in France for encoding clinical documents). In this frame, we propose a cross-language approach for acquiring automatically alignments between terms from MedDRA and SNOMED International. We had the hypothesis that using additional languages could be helpful to complement the mappings obtained between French terms. Our approach is based on a lexical method for aligning MedDRA terms to those from SNOMED International. The concomitant use of multiple languages resulted in several hundreds of new alignments and successfully validated or disambiguated some of these alignments.

Fleur Mougin, Natalia Grabar

Temporal Data Visualization and Analysis

Clinical Time Series Prediction with a Hierarchical Dynamical System

In this work we develop and test a novel hierarchical framework for modeling and learning multivariate clinical time series data. Our framework combines two modeling approaches: Linear Dynamical Systems (LDS) and Gaussian Processes (GP), and is capable to model and work with time series of varied length and with irregularly sampled observations. We test our framework on the problem of learning clinical time series data from the complete blood count panel, and show that our framework outperforms alternative time series models in terms of its predictive accuracy.

Zitao Liu, Milos Hauskrecht

Extraction, Analysis, and Visualization of Temporal Association Rules from Interval-Based Clinical Data

Temporal association rules have been recently applied to interval-based temporal clinical data, to discover complex temporal relationships. In this paper, we first propose a refinement of the Data-Mining algorithm proposed by Sacchi et al. (2007) for the extraction of temporal association rules, improving the algorithm complexity in case of anti-monotonous rule support. Then, we address the non-trivial problem of displaying and visually analyzing this kind of data, through the use of an OLAP-based multidimensional model, and by proposing a visualization solution explicitly dealing with temporal association rules.

Carlo Combi, Alberto Sabaini

Learning to Identify Inappropriate Antimicrobial Prescriptions

Inappropriate antimicrobial prescribing is a major clinical problem and health concern. Several hospitals rely on automated surveillance to achieve hospital-wide antimicrobial optimization. The main challenge in implementing these systems lies in acquiring and updating their knowledge. In this paper, we discuss a surveillance system which can acquire new rules and improve its knowledge base. Our system uses an algorithm based on instance-based learning and rule induction to discover rules for inappropriate prescriptions. The algorithm uses temporal abstraction to extract a meaningful time interval representation from raw clinical data, and applies nearest neighbor classification with a distance function on both temporal and non-temporal parameters. The algorithm is able to discover new rules for early switch from intravenous to oral antimicrobial therapy from real clinical data.

Mathieu Beaudoin, Froduald Kabanza, Vincent Nault, Louis Valiquette

An Approach for Mining Care Trajectories for Chronic Diseases

With the increasing burden of chronic illnesses, administrative health care databases hold valuable information that could be used to monitor and assess the processes shaping the trajectory of care of chronic patients. In this context, temporal data mining methods are promising tools, though lacking flexibility in addressing the complex nature of medical events. Here, we present a new algorithm able to extract patient trajectory patterns with different levels of granularity by relying on external taxonomies. We show the interest of our approach with the analysis of trajectories of care for colorectal cancer using data from the French casemix information system.

Elias Egho, Nicolas Jay, Chedy Raïssi, Gilles Nuemi, Catherine Quantin, Amedeo Napoli

Similarity Measuring between Patient Traces for Clinical Pathway Analysis

Clinical pathways leave traces, described as activity sequences with regard to a mixture of various latent treatment behaviors. Measuring similarities between patient traces can profitably be exploited further as a basis for providing insights into the pathways, and complementing existing techniques of clinical pathway analysis, which mainly focus on looking at aggregated data seen from an external perspective. In this paper, a probabilistic graphical model, i.e., Latent Dirichlet Allocation, is employed to discover latent treatment behaviors of patient traces for clinical pathways such that similarities of pairwise patient traces can be measured based on their underlying behavioral topical features. The presented method, as a basis for further tasks in clinical pathway analysis, are evaluated via a real-world data-set collected from a Chinese hospital.

Zhengxing Huang, Xudong Lu, Huilong Duan

Natural Language Processing

Instantiating Interactive Narratives from Patient Education Documents

In this paper, we present a proof-of-concept demonstrator of an Interactive Narrative for patient education. Traditionally, patient education documents are produced by health agencies, yet these documents can be challenging to understand for a large fraction of the population. In contrast, an Interactive Narrative supports a game-like exploration of the situations described in patient education documents, which should facilitate understanding, whilst also familiarising patients with real-world situations. A specific feature of our prototype is that its plan-based narrative representations can be instantiated in part from the original patient education document, using NLP techniques. In the paper we introduce our interactive narrative techniques and follow this with a discussion of specific issues in text interpretation related to the occurrence of clinical actions. We then suggest mechanisms to generate direct or indirect representations of such actions in the virtual world as part of Interactive Narrative generation.

Fred Charles, Marc Cavazza, Cameron Smith, Gersende Georg, Julie Porteous

Added-Value of Automatic Multilingual Text Analysis for Epidemic Surveillance

The early detection of disease outbursts is an important objective of epidemic surveillance. The web news are one of the information bases for detecting epidemic events as soon as possible, but to analyze tens of thousands articles published daily is costly. Recently, automatic systems have been devoted to epidemiological surveillance. The main issue for these systems is to process more languages at a limited cost. However, existing systems mainly process major languages (English, French, Russian, Spanish…). Thus, when the first news reporting a disease is in a minor language, the timeliness of event detection is worsened. In this paper, we test an automatic style-based method, designed to fill the gaps of existing automatic systems. It is parsimonious in resources and specially designed for multilingual issues. The events detected by the human-moderated ProMED mail between November 2011 and January 2012 are used as a reference dataset and compared to events detected in 17 languages by the system DAnIEL2 from web articles of this time-window. We show how being able to process press articles in languages less-spoken allows quicker detection of epidemic events in some regions of the world.

Gaël Lejeune, Romain Brixtel, Charlotte Lecluze, Antoine Doucet, Nadine Lucas

An Approach for Query-Focused Text Summarisation for Evidence Based Medicine

We present an approach for extractive, query-focused, single-document summarisation of medical text. Our approach utilises a combination of target-sentence-specific and target-sentence-independent statistics derived from a corpus specialised for summarisation in the medical domain. We incorporate domain knowledge via the application of multiple domain-specific features, and we customise the answer extraction process for different question types. The use of carefully selected domain-specific features enables our summariser to generate content-rich extractive summaries, and an automatic evaluation of our system reveals that it outperforms other baseline and benchmark summarisation systems with a percentile rank of 96.8%.

Abeed Sarker, Diego Mollá, Cécile Paris

Clustering of Medical Publications for Evidence Based Medicine Summarisation

We present a study of the clustering properties of medical publications for the aim of Evidence Based Medicine summarisation. Given a dataset of documents that have been manually assigned to groups related to clinical answers, we apply K-Means clustering and verify that the documents can be clustered reasonably well. We advance the implications of such clustering for natural language processing tasks in Evidence Based Medicine.

Sara Faisal Shash, Diego Mollá

Classifying Measurements in Dictated, Free-Text Radiology Reports

Radiological measurements (e.g., ‘3.2 x 1.4 cm’) are the predominant type of quantitative data in free-text radiology reports. We report on the development and evaluation of a classifier that labels measurement descriptors with the exam they refer to: current and/or prior exam. Our classifier aggregates regular expressions as binary features in a maximum entropy model. It has average F-measure 0.942 on 2,000 annotated instances; the rule-based baseline algorithm has F-measure 0.795. Potential applications and routes for future are discussed.

Merlijn Sevenster

Springer Professional

About this book

Table of Contents

Frontmatter