Skip to main content

2017 | Buch

Data Management and Analytics for Medicine and Healthcare

Third International Workshop, DMAH 2017, Held at VLDB 2017, Munich, Germany, September 1, 2017, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed conference proceedings of the Third International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2017, in Munich, Germany, in September 2017, held in conjunction with the 43rd International Conference on Very Large Data Bases, VLDB 2017.

The 9 revised full papers presented together with 2 keynote abstracts were carefully reviewed and selected from 16 initial submissions. The papers are organized in topical sections on data privacy and trustability for electronic health records; biomedical data management and Integration; online mining of Health related data; and clinical data analytics.

Inhaltsverzeichnis

Frontmatter

Data Privacy and Trustability for Electronic Health Records

Frontmatter
How Blockchain Could Empower eHealth: An Application for Radiation Oncology
(Extended Abstract)
Abstract
Electronic medical records (EMRs) contain critical, highly sensitive private healthcare information, and need to be frequently shared among peers. Blockchain provides a shared, immutable and transparent history of all the transactions to build applications with trust, accountability and transparency. This provides a unique opportunity to develop a secure and trustable EMR data management and sharing system using blockchain. In this paper, we discuss our perspectives on blockchain based healthcare data management and present a prototype of a framework for managing and sharing EMR data for cancer patient care.
Alevtina Dubovitskaya, Zhigang Xu, Samuel Ryu, Michael Schumacher, Fusheng Wang

Biomedical Data Management and Integration

Frontmatter
On-Demand Service-Based Big Data Integration: Optimized for Research Collaboration
Abstract
Biomedical research requires distributed access, analysis, and sharing of data from various disperse sources in the Internet scale. Due to the volume and variety of big data, materialized data integration is often infeasible or too expensive including the costs of bandwidth, storage, maintenance, and management. Óbidos (On-demand Big Data Integration, Distribution, and Orchestration System) provides a novel on-demand integration approach for heterogeneous distributed data. Instead of integrating data from the data sources to build a complete data warehouse as the initial step, Óbidos employs a hybrid approach of virtual and materialized data integrations. By allocating unique identifiers as pointers to virtually integrated data sets, Óbidos supports efficient data sharing among data consumers. We design Óbidos as a generic service-based data integration system, and implement and evaluate a prototype for multimodal medical data.
Pradeeban Kathiravelu, Yiru Chen, Ashish Sharma, Helena Galhardas, Peter Van Roy, Luís Veiga
CHIPS – A Service for Collecting, Organizing, Processing, and Sharing Medical Image Data in the Cloud
Abstract
Web browsers are increasingly used as middleware platforms offering a central access point for service provision. Using backend containerization, RESTful APIs, and distributed computing allows for complex systems to be realized that address the needs of modern compute intense environments. In this paper, we present a web-based medical image data and information management software platform called CHIPS (Cloud Healthcare Image Processing Service). This cloud-based services allows for authenticated and secure retrieval of medical image data from resources typically found in hospitals, organizes and presents information in a modern feed-like interface, provides access to a growing library of plugins that process these data, allows for easy data sharing between users and provides powerful 3D visualization and real-time collaboration. Image processing is orchestrated across additional cloud-based resources using containerization technologies.
Rudolph Pienaar, Ata Turk, Jorge Bernal-Rusiel, Nicolas Rannou, Daniel Haehn, P. Ellen Grant, Orran Krieger
High Performance Merging of Massive Data from Genome-Wide Association Studies
Abstract
The traditional data processing methods working on single computer show less scalability and efficiency for performing ordered full-outer-joining, on merging large number of individual Genome-Wide Associations Studies (GWAS) data. Although the emerging of big data platforms such as Hadoop and Spark shed lights on this problem, the inefficiency of keeping data in total-sorted order as well as the workload imbalance problem limit their performance. In this study, we designed and compared three new methodologies based on MapReduce, HBase and Spark respectively, to merge hundreds of individuals VCF files on their Single Nucleotide Polymorphism (SNP) location into a single TPED file. Our methodologies overcame the limitations stated above and considerably improved the performance with good scalability on input size and computing resources.
Xiaobo Sun, Fusheng Wang, Zhaohui Qin
An Emerging Role for Polystores in Precision Medicine
Abstract
Medical data is organically heterogeneous, and it usually varies significantly in both size and composition. Yet, this data is also a key for the recent and promising field of precision medicine, which focuses on identifying and tailoring appropriate medical treatments for the needs of the individual patients, based on their specific conditions, their medical history, lifestyle, genetic, and other individual factors. As we, and a database community at large, recognize that a “one size does not fit all” solution is required to work with such data, we present our observations based on our experiences, and the applications in the field of precision medicine. We make the case for the use of polystore architecture; how it applies for precision medicine; we discuss the reference architecture; describe some of its critical components (array database); and discuss the specific types of analysis that directly benefit from this database architecture, and the ways it serves the data.
Edmon Begoli, J. Blair Christian, Vijay Gadepally, Stavros Papadopoulos

Online Mining of Health Related Data

Frontmatter
Social Media Mining to Understand Public Mental Health
Abstract
In this paper, we apply text mining and topic modelling to understand public mental health. We focus on identifying common mental health topics across two anonymous social media platforms: Reddit and a mobile journalling/mood-tracking app. Furthermore, we analyze journals from the app to uncover relationships between topics, journal visibility (private vs. visible to other users of the app), and user-labelled sentiment. Our main findings are that (1) anxiety and depression are shared on both platforms; (2) users of the journalling app keep routine topics such as eating private, and these topics rarely appear on Reddit; and (3) sleep was a critical theme on the journalling app and had an unexpectedly negative sentiment.
Andrew Toulis, Lukasz Golab

Clinical Data Analytics

Frontmatter
Effects of Varying Sampling Frequency on the Analysis of Continuous ECG Data Streams
Abstract
A myriad of data is produced in intensive care units (ICU) even for short periods of time. This data is frequently used for monitoring patient’s immediate health status, not for real-time analysis because of technical challenges in real-time processing of such massive data. Data storage is also another challenge in making ICU data useful for retrospective studies. Therefore, it is important to know the minimal sampling frequency requirement to develop real-time analysis on ICU data and to develop a data storage plan. In this study, we have applied the Probabilistic Symbolic Pattern Recognition (PSPR) method in Paroxysmal Atrial Fibrillation (PAF) screening problem by analyzing electrocardiogram signals at different sampling frequencies varying from 128 Hz to 8 Hz. Our results show that using PSPR method, we can obtain a classification accuracy of 82.67% in identifying PAF subjects even when the test data is sampled at 8 Hz frequency (73.33% for 128 Hz). This classification accuracy drastically improved to 92% when other descriptive features were used along with PSPR features. The PSPR’s PAF screening ability at low sampling frequency indicates its potential for real-time analysis and wearable embedded computing applications.
Ruhi Mahajan, Rishikesan Kamaleswaran, Oguz Akbilgic
Detection and Visualization of Variants in Typical Medical Treatment Sequences
Abstract
Electronic Medical Records (EMRs) are widely used in many large hospitals. EMRs can reduce the cost of managing medical histories, and can also improve medical processes by the secondary use of these records. Medical workers including doctors, nurses, and technicians generally use clinical pathways as their guidelines for typical sequences of medical treatments. The medical workers traditionally generate the clinical pathways themselves based on their experiences. It is helpful for the medical workers to verify the correctness of existing clinical pathways or modify them by comparing the frequent sequential patterns in medical orders computationally extracted from EMR logs. Thinking that the EMR is a database and a typical clinical pathway is a frequent sequential pattern in the database in our previous work, we proposed a method to extract typical clinical pathways as frequent sequential patterns with treatment time information from EMR logs. These patterns tend to contain variants that are influential in verification and modification. In this paper, we propose an approach for detecting the variants in frequent sequential patterns of medical orders while considering time information. Since it is important to provide visual views of these variants so the results can be used effectively by the medical workers, we also develop an interactive graphical interface system for visualizing the results of variants in clinical pathways. The results of applying the approach to actual EMR logs in an university hospital are reported.
Yuichi Honda, Muneo Kushima, Tomoyoshi Yamazaki, Kenji Araki, Haruo Yokota
Umedicine: A System for Clinical Practice Support and Data Analysis
Abstract
Recording patient clinical data in a comprehensive and easy way is very important for health care providers. However, and although there are information systems to facilitate the storage and access to patient data, many records are still in paper. Even when data is stored electronically, systems often are complex to use and do not provide means to gather statistical information about a population of patients, thus limiting the usefulness of the data. Physicians often give up searching for relevant information to support their medical decisions because the task is too time-consuming. This paper proposes Umedicine, a web-based software application in Portuguese that addresses current limitations of clinical information systems. Umedicine is an application for physicians, patients and administrative staff that keeps clinical data (e.g., symptoms, clinical examination results, and treatments prescribed) up to date on a database in a structured way. It also provides easy and quick access to a large amount of clinical data collected over time. Furthermore, Umedicine supports the application of a particular clustering algorithm and a visualization module for analyzing patient time-series data, to identify evolution patterns. Preliminary user tests revealed promising results, showing that users were able to identify the evolution of groups of patients over time and their common characteristics.
Nuno F. Lages, Bernardo Caetano, Manuel J. Fonseca, João D. Pereira, Helena Galhardas, Rui Farinha
Association Rule Learning and Frequent Sequence Mining of Cancer Diagnoses in New York State
Abstract
Analyzing large scale diagnosis histories of patients could help to discover comorbidity or disease progression patterns. Recently, open data initiatives make it possible to access statewide patient data at individual level, such as New York State SPARCS data. The goal of this study is to explore frequent disease co-occurrence and sequence patterns of cancer patients in New York State using SPARCS data. Our collection includes 18,208,830 discharge records from 1,565,237 patients with cancer-related diagnoses during 2011–2015. We use Apriori algorithm to discover top disease co-occurrences for common cancer categories based on support. We generate top frequent sequences of diagnoses with at least one cancer related diagnosis from patients’ diagnosis histories using the cSPADE algorithm. Our data driven approach provides essential knowledge to support the investigation of disease co-occurrence and progression patterns for improving the management of multiple diseases.
Yu Wang, Fusheng Wang
Healthsurance – Mobile App for Standardized Electronic Health Records Database
Abstract
With the increasing popularity of Electronic Health Records (EHRs), there arises a need to understand its importance in terms of clinical contexts for a standard based health application. Standards for semantic interoperability propose the use of archetypes for building a health application. A usual practice followed for storing of EHRs is through graphical user interfaces. Generally, user interface is static corresponding to the underlying medical concept, often made manually and are prone to errors. However, evolution in knowledge demands for dynamically generated user interfaces to reduce time, minimize cost and enhance reliability. Current research implements mobile app for standardized Electronic Health Records Database termed as HEALTHSURANCE. The application maintains its dynamic behavior through creation of graphical user interfaces at runtime by gaining knowledge from the artefacts (known as archetypes) available from standard clinical repositories (such as Clinical Knowledge Manager). This provides easy and hassle-free user operability without any need of mobile developer. A standardized format and content helps to uplift the credibility of data and maintains a uniform and specific set of constraints used to evaluate the user’s health. A generic centralized database is chosen for data storage to support evolution in clinical knowledge and to handle heterogeneity of EHRs data. Implementing mobile app based on archetype paradigm avoids reimplementation of systems, migrating databases and allows the creation of future-proof systems.
Prateek Jain, Sagar Bhargava, Naman Jain, Shelly Sachdeva, Shivani Batra, Subhash Bhalla
Backmatter
Metadaten
Titel
Data Management and Analytics for Medicine and Healthcare
herausgegeben von
Edmon Begoli
Fusheng Wang
Gang Luo
Copyright-Jahr
2017
Electronic ISBN
978-3-319-67186-4
Print ISBN
978-3-319-67185-7
DOI
https://doi.org/10.1007/978-3-319-67186-4

Premium Partner