Skip to main content

2019 | Buch

Data Science for Healthcare

Methodologies and Applications

herausgegeben von: Ph.D. Sergio Consoli, Prof. Diego Reforgiato Recupero, Prof. Milan Petković

Verlag: Springer International Publishing

insite
SUCHEN

Über dieses Buch

This book seeks to promote the exploitation of data science in healthcare systems. The focus is on advancing the automated analytical methods used to extract new knowledge from data for healthcare applications. To do so, the book draws on several interrelated disciplines, including machine learning, big data analytics, statistics, pattern recognition, computer vision, and Semantic Web technologies, and focuses on their direct application to healthcare.
Building on three tutorial-like chapters on data science in healthcare, the following eleven chapters highlight success stories on the application of data science in healthcare, where data science and artificial intelligence technologies have proven to be very promising.
This book is primarily intended for data scientists involved in the healthcare or medical sector. By reading this book, they will gain essential insights into the modern data science technologies needed to advance innovation for both healthcare businesses and patients. A basic grasp of data science is recommended in order to fully benefit from this book.

Inhaltsverzeichnis

Frontmatter

Challenges and Basic Technologies

Frontmatter
Data Science in Healthcare: Benefits, Challenges and Opportunities
Abstract
The advent of digital medical data has brought an exponential increase in information available for each patient, allowing for novel knowledge generation methods to emerge. Tapping into this data brings clinical research and clinical practice closer together, as data generated in ordinary clinical practice can be used towards rapid-learning healthcare systems, continuously improving and personalizing healthcare. In this context, the recent use of Data Science technologies for healthcare is providing mutual benefits to both patients and medical professionals, improving prevention and treatment for several kinds of diseases. However, the adoption and usage of Data Science solutions for healthcare still require social capacity, knowledge and higher acceptance. The goal of this chapter is to provide an overview of needs, opportunities, recommendations and challenges of using (Big) Data Science technologies in the healthcare sector. This contribution is based on a recent whitepaper (http://​www.​bdva.​eu/​sites/​default/​files/​Big%20​Data%20​Technologies%20​in%20​Healthcare.​pdf) provided by the Big Data Value Association (BDVA) (http://​www.​bdva.​eu/​), the private counterpart to the EC to implement the BDV PPP (Big Data Value PPP) programme, which focuses on the challenges and impact that (Big) Data Science may have on the entire healthcare chain.
Ziawasch Abedjan, Nozha Boujemaa, Stuart Campbell, Patricia Casla, Supriyo Chatterjea, Sergio Consoli, Cristobal Costa-Soria, Paul Czech, Marija Despenic, Chiara Garattini, Dirk Hamelinck, Adrienne Heinrich, Wessel Kraaij, Jacek Kustra, Aizea Lojo, Marga Martin Sanchez, Miguel A. Mayer, Matteo Melideo, Ernestina Menasalvas, Frank Moller Aarestrup, Elvira Narro Artigot, Milan Petković, Diego Reforgiato Recupero, Alejandro Rodriguez Gonzalez, Gisele Roesems Kerremans, Roland Roller, Mario Romao, Stefan Ruping, Felix Sasaki, Wouter Spek, Nenad Stojanovic, Jack Thoms, Andrejs Vasiljevs, Wilfried Verachtert, Roel Wuyts
Introduction to Classification Algorithms and Their Performance Analysis Using Medical Examples
Abstract
In this chapter, we give an introduction to classification algorithms and the metrics that are used to quantify and visualize their performance. We first briefly explain what we mean with a classification algorithm, and, as an example, we describe in more detail the naive Bayesian classification algorithm. Using the concept of a confusion matrix, we next define the various performance metrics that can be derived from it, including sensitivity and specificity that define the two dimensions of ROC space. We next argue that correctly evaluating the performance of a classification algorithm requires taking into account the conditions in which the algorithm has to operate in practice. These so-called operating conditions consist of two elements: class skew and cost skew. We show that both elements can be combined into a single parameter that defines cost, and that iso-cost curves are straight lines in ROC space.
Additionally, as alternatives to ROC space, we briefly review two other spaces, namely, precision-recall space and cost-curve space. The latter was introduced by Drummond and Holte (Explicitly representing expected cost: an alternative to ROC representation. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2000, Boston, pp 198–207, 2000; Mach Learn 65(1):95–130, 2006). To illustrate the material we present, we will use a number of examples taken from the medical domain.
Jan Korst, Verus Pronk, Mauro Barbieri, Sergio Consoli
The Role of Deep Learning in Improving Healthcare
Abstract
Healthcare is transforming through adoption of information technologies (IT) and digitalization. Machine learning (ML) and artificial intelligence (AI) are two of the IT technologies that are leading this transformation. In this chapter we focus on Deep Learning (DL), a subfield of ML that relies on deep artificial neural networks to deliver breakthroughs in long-standing AI problems. DL is about working with high-dimensional data (e.g., images, speech recording, natural language) and learning efficient representations that allow for building successful models. We present a structured overview of DL methods applied to healthcare problems based on their suitability of the different technologies to the available modalities of healthcare data. This data-centric perspective reflects the data-driven nature of DL methods and allows side-by-side comparison with different domains in healthcare. Challenges, in broad adoption of DL, are commonly related to some of its main drawbacks, particularly lack of interpretability and transparency. We discuss the drawbacks and limitations of DL technology that specifically come to light in the domain of healthcare. We also address the need for a considerable amount of data and annotations to successfully build these models that can be a particularly expensive and time-consuming effort. Overall, the chapter offers insights into existing applications of DL to healthcare on their suitability for specific types of data and their limitations.
Stefan Thaler, Vlado Menkovski

Specific Technologies and Applications

Frontmatter
Making Effective Use of Healthcare Data Using Data-to-Text Technology
Abstract
Healthcare organizations are in a continuous effort to improve health outcomes, reduce costs, and enhance patient experience of care. Data is essential to measure and help achieving these improvements in healthcare delivery. Consequently, a data influx from various clinical, financial, and operational sources is now overtaking healthcare organizations and their patients. The effective use of this data, however, is a major challenge. Clearly, text is an important medium to make data accessible. Financial reports are produced to assess healthcare organizations on some key performance indicators to steer their healthcare delivery. Similarly, at a clinical level, data on patient status is conveyed by means of textual descriptions to facilitate patient review, shift handover, and care transitions. Likewise, patients are informed about data on their health status and treatments via text, in the form of reports, or via e-health platforms by their doctors. Unfortunately, such text is the outcome of a highly labor-intensive process if it is done by healthcare professionals. It is also prone to incompleteness and subjectivity and hard to scale up to different domains, wider audiences, and varying communication purposes. Data-to-text is a recent breakthrough technology in artificial intelligence which automatically generates natural language in the form of text or speech from data. This chapter provides a survey of data-to-text technology, with a focus on how it can be deployed in a healthcare setting. It will (1) give an up-to-date synthesis of data-to-text approaches, (2) give a categorized overview of use cases in healthcare, (3) seek to make a strong case for evaluating and implementing data-to-text in a healthcare setting, and (4) highlight recent research challenges.
Steffen Pauws, Albert Gatt, Emiel Krahmer, Ehud Reiter
Clinical Natural Language Processing with Deep Learning
Abstract
The emergence and proliferation of electronic health record (EHR) systems has incrementally resulted in large volumes of clinical free text documents available across healthcare networks. The huge amount of data inspires research and development focused on novel clinical natural language processing (NLP) solutions to optimize clinical care and improve patient outcomes. In recent years, deep learning techniques have demonstrated superior performance over traditional machine learning (ML) techniques for various general-domain NLP tasks, e.g., language modeling, parts-of-speech (POS) tagging, named entity recognition, paraphrase identification, sentiment analysis, etc. Clinical documents pose unique challenges compared to general-domain text due to the widespread use of acronyms and nonstandard clinical jargons by healthcare providers, inconsistent document structure and organization, and requirement for rigorous de-identification and anonymization to ensure patient data privacy. This tutorial chapter will present an overview of how deep learning techniques can be applied to solve NLP tasks in general, followed by a literature survey of existing deep learning algorithms applied to clinical NLP problems. Finally, we include a description of various deep learning-driven clinical NLP applications developed at the artificial intelligence (AI) lab in Philips Research in recent years—such as diagnostic inferencing from unstructured clinical narratives, relevant biomedical article retrieval based on clinical case scenarios, clinical paraphrase generation, adverse drug event (ADE) detection from social media, and medical image caption generation.
Sadid A. Hasan, Oladimeji Farri
Ontology-Based Knowledge Management for Comprehensive Geriatric Assessment and Reminiscence Therapy on Social Robots
Abstract
MARIO is an assistive robot that has to support a set of knowledge-intensive tasks aimed at increasing autonomy and reducing loneliness in people with dementia and supporting caregivers in their activity to assess patients’ cognitive status. Examples of knowledge-intensive tasks are the comprehensive geriatric assessment (CGA) and the delivery of reminiscence therapy. In order to enable these tasks, MARIO features a set of abilities implemented by pluggable software applications. MARIO’s abilities contribute to and benefit from a common knowledge management framework. For example, the ability associated with the CGA retrieves questions to be posed to the patient from the framework and stores the obtained answers and associated relevant metadata. In this work we presents the MARIO knowledge management software framework, which combines robotics with ontology-based approaches and Semantic Web technologies. It consists of (1) a set of interconnected and modularized ontologies, meant to model all knowledge areas that are relevant for MARIO abilities, and (2) a set of software interfaces that provide high-level access to the ontology network and its associated knowledge base. Finally, we demonstrate how the knowledge management framework supports the applications for CGA and reminiscence therapy, implemented on top of the knowledge base.
Luigi Asprino, Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Diego Reforgiato Recupero, Alessandro Russo
Assistive Robots for the Elderly: Innovative Tools to Gather Health Relevant Data
Abstract
Robots can be key players in the healthcare digital revolution that is underway. More specifically, assistive robots are going to profoundly affect the way healthcare is delivered. In the research and commercial field, robotic technologies have been used for physical and cognitive rehabilitation, surgery, telemedicine, drug delivery, and patient management. Robots are currently used across a range of environments, including hospitals, homes, schools, and nursing homes. Healthcare robotics is an exciting, emerging area that can benefit all stakeholders across a wide range of settings. The objective of this chapter is to investigate how this can happen given the current state of the art in the field. We will focus our argument on how they can be used as valuable agents for the acquisition of novel data, relevant to the healthcare and well-being domain.
Alessandra Vitanza, Grazia D’Onofrio, Francesco Ricciardi, Daniele Sancarlo, Antonio Greco, Francesco Giuliani
Overview of Data Linkage Methods for Integrating Separate Health Data Sources
Abstract
Health data sources across healthcare service deliveries are notoriously disconnected hampering good use of data. Hospitals, family doctors, pharmacists, and health insurers all have their own data, while the data may contain information about the same patients. Also, industries offering healthcare and wellness services host and maintain their own data repositories about the patients on their services. Lastly, governmental organizations collect register and survey data on public health, healthcare utilization, and health outcome. Linking health data of individuals, events, and locations at various aggregation levels from different sources can be extremely insightful. More information can be pulled from linked data than from every data source separately. Bringing patient data together for which unique personal identifiers exist, such as social security numbers, is rather straightforward. In many practices, such identifiers are simply lacking meaning that one has to resort to variables that are not necessarily unique to a person which makes the task of linking data far more challenging. To make things worse, these linking variables come with errors due to misspellings, coding differences, or transcription mistakes. Nevertheless, data linkage needs to be done flawlessly as connecting the wrong patient records or missing valuable connections between patient records can result in biased analyses on linked datasets. This chapter provides a state-of-the-art survey in data linkage technology within healthcare. It will give (1) an overview of the various methods in data linkage including deterministic and probabilistic approaches (2) and a synthesis of healthcare use cases in which data linkage is essential with a discussion on the legal and privacy challenges of using data linkage in healthcare.
Ana Kostadinovska, Muhammad Asim, Daniel Pletea, Steffen Pauws
A Flexible Knowledge-Based Architecture for Supporting the Adoption of Healthy Lifestyles with Persuasive Dialogs
Abstract
Automatically monitoring and supporting healthy lifestyle is a recent research trend, fostered by the availability of low-cost monitoring devices, and it can significantly contribute to the prevention of chronic diseases deriving from incorrect diet and lack of physical activity. In this chapter, we present the use of Semantic Web technologies to build an architecture for supporting the monitoring of people and for persuading them to follow healthy lifestyles. Semantic technologies are used for modeling all relevant information and for fostering reasoning activities by combining user-generated data and domain knowledge. The output of reasoning activities feeds a dialog manager component in charge of engaging users with persuasive messages based on the user data, context, and goals, The architecture is designed to be easily portable across domains and has been preliminary evaluated within a user study involving 119 users. Observed results demonstrated the effectiveness of the proposed solution and opened the possibility of adopting our platform in wider contexts.
Mauro Dragoni, Tania Bailoni, Rosa Maimone, Michele Marchesoni, Claudio Eccher
Visual Analytics for Classifier Construction and Evaluation for Medical Data
Abstract
Designing and optimizing classifiers for multidimensional mixed quantitative-and-categorical data is a challenging task. We present here a workflow and associated toolset that assists with this task, by providing the designer with insights into how the multidimensional input data is structured and how this structure influences the classification results. Our approach heavily relies on visual analytics for detecting relevant patterns in the input data, observing the distribution of classification errors, detecting and controlling the effect of feature selection on the classification results, and comparing in detail the performance of different classification techniques. We demonstrate the value of our approach on the concrete problem of building a classifier for predicting biochemical recurrence, indicating potential cancer relapse after prostate cancer treatment, from clinical patient data.
Jacek Kustra, Alexandru Telea
Data Visualization in Clinical Practice
Abstract
As health-care data is increasingly digitized and standardized not only for research purposes but also for clinical practice, opportunities for increased personalized medicine through big data analytics arise. However, practical limitations exist towards acceptance of data analytics models to be used in clinical practice. Traditionally, models (typically rule-based) are extensively validated before being taken up in practice. With the fast pace of development of new data, techniques, and devices, time-consuming external validation will often invalidate future application of a model, due to new or better diagnostic measurements or treatment techniques.
To accommodate for this fast pace of development, a more flexible way of model development is needed. This entails that certain levels of uncertainty need to be accepted in the external validity of the model, either because the model has not undergone thorough external validation or because circumstances have changed since the model was developed.
We can allow for the doctor to stay in charge of any inferences made from data through visualization instead of mere presentation of, e.g., risk scores or survival probabilities from a trained model. Absence of external validation requires that visualizations are easily interpretable: it should be clear how they were constructed (they should be as unbiased as possible), and the limitations of the underlying model of the data should be clearly presented to the user.
In this chapter, we present direct data visualization techniques, which adhere to these requirements, along with their limitations and directions for future research into readily interpretable, unbiased data visualizations for big data in health care.
Monique Hendriks, Charalampos Xanthopoulakis, Pieter Vos, Sergio Consoli, Jacek Kustra
Using Process Analytics to Improve Healthcare Processes
Abstract
Healthcare processes are inherently complex as each patient is unique and medical staff deviate from protocols, often for valid reasons. Event logs collected by modern process-aware (healthcare) information systems provide a wealth of data and can be used to analyze the adherence to these protocols. Process mining is a young research area combining data science (machine learning, data mining, etc.) and business process management. Its main contributions have been techniques for process discovery (the automatic learning of process models from event data) and conformance checking (aligning observed and modeled behavior). However, existing techniques face challenging issues discovering high-quality process models in a healthcare setting. In this chapter we introduce the key concepts of process mining such as event logs, process models, and process discovery. We then show the application of two recent process mining techniques on a public event data set from the healthcare domain to demonstrate how some of the common pitfalls can be overcome. A first presented technique projects data statistics on a process model, allowing the analysis of correlations between patient characteristics and the executed activities (e.g., type of treatment). A second technique analyzes process performance without a specific need for a process model by considering contextual information. It correlates process characteristics with process performance.
Bart Hompes, Prabhakar Dixit, Joos Buijs
A Multi-Scale Computational Approach to Understanding Cancer Metabolism
Abstract
A first principles Nash equilibrium approach to modeling, simulation, and analysis of metabolic pathways is presented. The modeling framework is described in detail, and small examples illustrating mass and charge balancing, the inclusion of enzymatic reactions in the model, constraint linear independence, and allosteric inhibition are given in order to provide a tutorial for the reader. The methodology is then applied to the methionine salvage pathway in order to demonstrate that it can correctly capture the behavior of an important pathway in the study of cancer. It is shown that methylthioadenosine (MTA) accumulation as a result of the loss of activity of the enzyme S-methyl-5-thioadenosine phosphorylase (MTAP) is correctly predicted by the Nash equilibrium approach under tight regulation of adenine. Several examples are presented to elucidate the key ideas in modeling cancer metabolism using the Nash equilibrium approach.
Angelo Lucia, Peter A. DiMaggio
Leveraging Financial Analytics for Healthcare Organizations in Value-Based Care Environments
Abstract
The US healthcare system serves as an example case in this chapter for leveraging analytics for financial insights. The presented financial metrics and performance measures will be a vital part in the holistic framework of data-driven analytics and business intelligence in the context of population health management, an approach that aims to reduce the cost of healthcare while simultaneously holding the possibility of meaningfully improving quality and experience of care for defined patient populations. The here described methodology of population health analytics is in turn adaptable to other countries’ healthcare systems and data governance. Healthcare organizations will be gaining significant returns on investments on population health analytics solutions when they want to succeed in an increasingly value-driven reimbursement system where regulations and payment incentives are created to move away from the current fee-for-service models towards outcomes-based accountable care. As a result, healthcare organizations are incentivized and penalized financially around health outcomes which means that it is paramount to see improvements in care quality and to lower spending for all stakeholders across the care continuum. Hence, healthcare organizations have a lot to gain in understanding their own financial performance in terms of costs, utilization, and compensations, their patient populations in terms of health outcomes, and their palette of intervention, outreach, and engagement opportunities. It is evident that the integration of population health management solutions with the data warehouse of health providers will increase demand for a new job type, the healthcare data scientist. This chapter will provide an introduction to the reader to learn more about the domain-specific skillset of a healthcare financial data scientist. The general skillset of a data scientist is a prerequisite, and very interesting more advanced topics can be found throughout this book. This chapter (1) starts with an overview of financing healthcare delivery, (2) uses well-documented administrative healthcare claims as sample data source, (3) introduces medical coding classification systems and related sources, (4) provides methods how to create selected financial and clinical key performance indicators (KPIs) and drill downs, and (5) provides a concept to visualize the results to the C-suite executives of provider organizations on a performance dashboard that can inform on decision-making and the development of best practices.
Dieter Van de Craen, Daniele De Massari, Tobias Wirth, Jason Gwizdala, Steffen Pauws
Metadaten
Titel
Data Science for Healthcare
herausgegeben von
Ph.D. Sergio Consoli
Prof. Diego Reforgiato Recupero
Prof. Milan Petković
Copyright-Jahr
2019
Electronic ISBN
978-3-030-05249-2
Print ISBN
978-3-030-05248-5
DOI
https://doi.org/10.1007/978-3-030-05249-2