Skip to main content

2017 | Buch

Data Management and Analytics for Medicine and Healthcare

Second International Workshop, DMAH 2016, Held at VLDB 2016, New Delhi, India, September 9, 2016, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the Second International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2016, in New Delhi, India, in September 2016, held in conjunction with the 42nd International Conference on Very Large Data Bases, VLDB 2016.
The 7 revised full papers presented together with 2 invited papers and 3 keynote abstracts were carefully reviewed and selected from 11 initial submissions. The papers are organized in topical sections on knowledge discovery of biomedical data; managing, querying and processing of medical image data; information extraction and data integration for biomedical data; and health information systems.

Inhaltsverzeichnis

Frontmatter

Knowledge Discovery of Biomedical Data

Frontmatter
Exploiting HPO to Predict a Ranked List of Phenotype Categories for LiverTox Case Reports
Abstract
Drug-induced liver injury (DILI) is an uncommon but important and challenging adverse drug event developed following the use of drugs, both prescription and over-the-counter. Early detection of DILI cases can greatly improve the patient care as discontinuing the offending drugs is essential for the care of DILI cases. An online resource, LiverTox, has been established to provide up-to-date, comprehensive clinical information on DILI in the form of case reports. In this study, we explored the use of the Human Phenotype Ontology (HPO) to annotate case reports with HPO terms and to predict a ranked list of phenotype categories (describing patient outcomes) that is most closely matched to the HPO annotations that are attached to the case report. The prediction performance based on our method was found to be good to excellent for 67% of case reports included in this study, i.e., the phenotype category that was assigned to the report was among the Top 3 predicted phenotype category descriptions. Future directions would be to incorporate other annotations, laboratory findings, and the exploration of other semantic-based methods for case report retrieval and ranking.
Casey Lynnette Overby, Louiqa Raschid, Hongfang Liu
Patient Records Retrieval System for Integrated Care in Treatment of Cervical Spine Defect
Abstract
In clinical decision making, information on the treatment of patients that show similar medical conditions and symptoms to the current case, is one of most relevant information sources to create a good, evidence-based treatment plan. However, the retrieval of similar cases is still challenging and automatic support is missing. The reasons are two-fold: First, the query formulation is difficult since multiple criteria need to be selected and specified in short query phrases. Second, the discrete storage of multimedia patient records makes the retrieval and summary of a patient history extremely difficult. In this paper, we present a retrieval system for electronic health records (EHR). More specifically, a retrieval platform for EHRs for supporting clinical decision making in treatment of cervical spine defects with the information extracted from textual data of patient records is implemented as prototype. The patient cases are classified according to cervical spine defect classes, while the classification relies upon rules obtained from the corresponding defect classification schema and guidelines. In a retrospective study, the classifier is applied to clinical documents and the classification results are evaluated.
Yihan Deng, Kerstin Denecke

Managing, Querying and Processing of Medical Image Data

Frontmatter
IEVQ: An Iterative Example-Based Visual Query for Pathology Database
Abstract
Microscopic image analysis of nuclei in pathology images generates tremendous amount of spatially derived data to support biomedical research and potential diagnosis. Such spatial data can be managed by traditional SQL based spatial databases and queried by SQL for spatial relationships. However, traditional spatial databases are designed for structured data with limited expressibility, which is difficult to support queries for complex visual patterns. Moreover, SQL based queries are not intuitive for biomedical researchers or pathologists.
In this paper, we investigate the expressive power of visual query for spatial databases and propose an effective yet general Iterative Example-based Visual Query (IEVQuery) framework to query shapes and distributions. More specifically, we extract features from nuclei in pathology databases, such as shape polygon nuclei density distribution, and nuclei growth directions to build search indexes. The user employs visual interactions such as sketching to input queries for interesting patterns. Meanwhile, the user is allowed to iteratively create queries, which are based on previous search results, to finely tune the features more accurately to find preferred results. We build a system to enable users to specify sketch based queries interactively for (1) nuclei shapes, (2) nuclei densities, and (3) nuclei growth directions. To validate our methods, we take a pathology database [11] consisting of hundreds of millions of nuclei, and enable the user to search in the database to find most matching results through our system.
Cong Xie, Wen Zhong, Jun Kong, Wei Xu, Klaus Mueller, Fusheng Wang
Storing and Querying DICOM Data with HYTORMO
Abstract
In the health care industry, DICOM (Digital Imaging and Communication in Medicine) standard has become very popular for storage and transmission of digital medical images and reports. The ever-increasing size, high velocity and variety of the DICOM data collections make them more and more inefficient to be stored and queried them using a single data storage technique, e.g., a row store or a column store. In this study, we first highlight challenges in DICOM data management. We then describe HYTORMO, a new model to store and query the DICOM data. HYTORMO uses a hybrid data storage strategy that is aimed not only to leverage the advantage of both row and column stores, but also to attempt to keep a trade-off among reducing disk I/O cost, reducing tuple construction cost and reducing storage space. In addition, Bloom filters are applied to reduce network I/O cost during query processing. We prototyped our model on the top of Spark. Our preliminary experiments validate the proposed model in real DICOM datasets and show the effectiveness of our method.
Danh Nguyen-Cong, Laurent d’Orazio, Nga Tran, Mohand-Said Hacid
Cloud-Based Whole Slide Image Analysis Using MapReduce
Abstract
Systematic analysis of high resolution whole slide images enables more effective diagnosis, prognosis and prediction of cancer and other important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be divided into smaller regions for processing due to computer memory limitations, which will lead to inaccurate results due to the ignorance of boundary crossing objects. In this paper, we propose a highly scalable and cost effective MapReduce based image analysis framework for whole slide image processing, and provide a cloud based implementation. The framework takes a grid-based overlapping partitioning scheme, and provides parallelization of image segmentation based on MapReduce. It provides graceful handling of boundary objects with a highly efficient spatial indexing based matching method, thus avoiding loss of accuracy due to partitioning. We demonstrate that the system achieves high scalability and is cost-effective – our experiments demonstrate that it costs less than fifteen cents to analyze one image on average using Amazon Elastic MapReduce.
Hoang Vo, Jun Kong, Dejun Teng, Yanhui Liang, Ablimit Aji, George Teodoro, Fusheng Wang

Information Extraction and Data Integration for Biomedical Data

Frontmatter
Drug Dosage Balancing Using Large Scale Multi-omics Datasets
Abstract
Cancer is a disease of biological and cell cycle processes, driven by dosage of the limited set of drugs, resistance, mutations, and side effects. The identification of such limited set of drugs and their targets, pathways, and effects based on large scale multi-omics, multi-dimensional datasets is one of key challenging tasks in data-driven cancer genomics. This paper demonstrates the use of public databases associated with Drug-Target(Gene/Protein)-Disease to dissect the in-depth analysis of approved cancer drugs, their genetic associations, their pathways to establish a dosage balancing mechanism. This paper will also help to understand cancer as a disease associated pathways and effect of drug treatment on the cancer cells. We employ the Semantic Web approach to provide an integrated knowledge discovery process and the network of integrated datasets. The approach is employed to sustain the biological questions involving (1) Associated drugs and their omics signature, (2) Identification of gene association with integrated Drug-Target databases (3) Mutations, variants, and alterations from these targets (4) Their PPI Interactions and associated oncogenic pathways (5) Associated biological process aligned with these mutations and pathways to identify IC-50 level of each drug along-with adverse events and alternate indications. In principal this large semantically integrated database of around 30 databases will serve as Semantic Linked Association Prediction in drug discovery to explore and expand the dosage balancing and drug re-purposing.
Alokkumar Jha, Muntazir Mehdi, Yasar Khan, Qaiser Mehmood, Dietrich Rebholz-Schuhmann, Ratnesh Sahay
A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes
Abstract
Medical research use cases are population centric, unlike the clinical use cases which are patient or individual centric. Hence the research use cases require accessing medical archives and data source repositories of heterogeneous nature. Traditionally, in order to query data from these data sources, users manually access and download parts or whole of the data sources. The existing solutions tend to focus on a specific data format or storage, which prevents using them for a more generic research scenario with heterogeneous data sources where the user may not have the knowledge of the schema of the data a priori.
In this paper, we propose and discuss the design, implementation, and evaluation of Data Café, a scalable distributed architecture that aims to address the shortcomings in the existing approaches. Data Café lets the resource providers create biomedical data lakes from various data sources, and lets the research data users consume the data lakes efficiently and quickly without having a priori knowledge of the data schema.
Pradeeban Kathiravelu, Ashish Sharma
Building an i2b2-Based Integrated Data Repository for Cancer Research: A Case Study of Ovarian Cancer Registry
Abstract
In this study, we describe our preliminary efforts in building an i2b2-based integrated data repository that supports centralized data management for ovarian cancer clinical research, and discuss important lessons learnt that would inspire the evaluation and enhancement for future generic cancer-specific data repository. We collected multiple types of heterogeneous clinical data, including demographic, outcome, chemo-treatment and lab-test information for ovarian cancer. To better integrate different data types, we conducted data normalization procedures through reusing standard codes and creating mappings between local codes and standard vocabularies. We also developed the extract, transform and load (ETL) scripts to load the data into an i2b2 instance. Through further analytic practices, we evaluated major expectations of the systems according to common clinical research needs, including cohort query and identification, clinical data-based hypothesis-testing, and exploratory data-mining. We also identified and discussed outstanding issues we will address through additional enhancement of existing i2b2 system.
Na Hong, Zheng Li, Richard C. Kiefer, Melissa S. Robertson, Ellen L. Goode, Chen Wang, Guoqian Jiang

Health Information Systems

Frontmatter
AsthmaCheck: Multi-Level Modeling Based Health Information System
Abstract
Every hospital uses their own format for creating their information system for storing patient’s data. This doesn’t allow hospitals to exchange patient data. Hence, there is requirement for standardized Hospital Information System (HIS). Also the HIS should be able to incorporate semantic interoperability. With technological advancement, clinicians and patients should get themselves involved in using the Electronic Health Records (EHRs). The current research provides roadmap for the introduction of domain specific clinical application following openEHR standards based on Multi-Level Modeling. Standardization will help in reducing cost and medical errors as well enhancing data quality. The current study focuses on (1) advantage of EHRs, (2) the need for standardization to improve quality of health records, thereby establishing interoperability among hospitals, (3) recognizing the use of archetypes for knowledge-based systems, (4) proposing framework for standardization, and (5) comparison of proposed approach with current HIS.
Tanveen Singh Bharaj, Shelly Sachdeva, Subhash Bhalla
Backmatter
Metadaten
Titel
Data Management and Analytics for Medicine and Healthcare
herausgegeben von
Fusheng Wang
Lixia Yao
Gang Luo
Copyright-Jahr
2017
Electronic ISBN
978-3-319-57741-8
Print ISBN
978-3-319-57740-1
DOI
https://doi.org/10.1007/978-3-319-57741-8

Premium Partner