Skip to main content

2016 | Buch

Bioinformatics and Biomedical Engineering

4th International Conference, IWBBIO 2016, Granada, Spain, April 20-22, 2016, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 4th International Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2016, held in Granada, Spain, in April 2016.

The 69 papers presented were carefully reviewed and selected from 286 submissions. The scope of the conference spans the following areas: bioinformatics for healthcare and diseases; biomedical image analysis; biomedical signal analysis; computational systems for modeling biological processes; eHealth; tools for next generation sequencing data analysis; assistive technology for people with neuromotor disorders; fundamentals of biological dynamics and maximization of the information extraction from the experiments in the biological systems; high performance computing in bioinformatics, computational biology and computational chemistry; human behavior monitoring, analysis and understanding; pattern recognition and machine learning in the -omics sciences; and resources for bioinformatics.

Inhaltsverzeichnis

Frontmatter

Bioinformatics for Healthcare and Diseases

Frontmatter
The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets

Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.

Giuseppe Lancia, Paolo Serafini
Development of a Handheld Side-Stream Breath Analyser for Point of Care Metabolic Rate Measurement

A novel handheld side-stream breath analyser has been developed. The low-cost device offers breath-by-breath measurements of O2, CO2, temperature, relative humidity and gas flow rate. Metabolic rate can be calculated from the inspired and expired gas concentrations; a knowledge of this over a 24 h period can guide calorific intake. The analyser provides easy-to-read results on either a laptop or smart phone. Results for the O2 and CO2 sensors demonstrate the device’s potential for metabolic rate breath analysis. The O2 sensor is not able to follow changes in O2 concentration during the breathing cycle; however, the newly developed affordable and low-power consumption CO2 sensor performs comparably to a bulky, high power consumption commercial device.

T. A. Vincent, A. Wilson, J. G. Hattersley, M. J. Chappell, J. W. Gardner
Confluence of Genes Related to the Combined Etiology DOISm (Diabetes, Obesity, Inflammation and Metabolic Syndrome) in Dissecting Nutritional Phenotypes

The term DOISm (Diabetes, Obesity, Inflammation and metabolic Syndrome) describes a confluence of comorbidities specifying these disease phenotypes. Recent studies using genome-wide association analysis have identified genes and variations that correlate human phenotype within phenotype prediction programs. Benefiting from such post-genomics outcomes, we catalogued genes that have been associated with each of the four conditions before searching for confluence of any two or three conditions, and the confluence of genes concomitantly involved in all phenotypes. Bioinformatics analyses were performed using multi-relational data mining techniques to cover sequence, structure and functional/clinical features. We used high-confidence predictions for gene functional classification analyses for better phenotyping DOISm confluence. Our curated panel of 1439 DOISm genes and a subset of 217 confluent genes represents a platform to assist in dissecting complex nutritional phenotypes. Our repertoire of human genes likely to be involved in DOISm is an attempt to guide further subtyping of complex phenotypes.

Ana Paula Moreira Bezerra, Samara Cardoso Silva-Santiago, José Francisco Diogo Da Silva Jr., Emanuel Diego S. Penha, Monalisa M. Silveira, Myrna S. Ramos, Mônica  M.  Silva, Ana Carolina L. Pacheco, Diana Magalhaes Oliveira
Studying the Herd Immunity Effect of the Varicella Vaccine in the Community of Valencia, Spain

In 2013, the Spanish Agency of Medicines blocked the varicella vaccine distribution based on a partial coverage of the vaccine that hypothetically could induce an increase of cases in adults: as non vaccinated children reach adulthood without having had contact with the virus in case the herd immunity stopped the virus circulation, or a hypothetical loss of vaccine protection in the long run. Also, this measure wanted to avoid increasing the number of cases of herpes zoster in adults. In this paper we develop a mathematical model to study the transmission dynamics of varicella in order to assess the impact of the partial coverage of the vaccination program. This is of paramount importance because, from the Public Health point of view, the herd immunity may be an undesirable effect of the partial vaccination due to that varicella and/or herpes zoster in adults use to be severe.

A. Díez-Gandía, R. -J. Villanueva, J. -A. Moraño, L. Acedo, J. Mollar, J. Díez-Domingo
Angel: Towards a Multi-level Method for the Analysis of Variants in Individual Genomes

Genomic medicine pursues to develop methods for improving early diagnosis processes, the efficiency of treatments and facilitating the discovery of new therapies, and mainly searches for associations between the genotype of individuals and their phenotypical features. The huge genomic variability is a major difficulty for developing effective computational methods, since the correlation of a locus and a phenotype does not necessarily mean causality. Hence, methods for genome-based diagnosis need to take into account the complexity of the genomic background and the biological networks involved in the manifestation of phenotypes and disorders.We describe a method for analysing the variants identified in the genome of human individuals, sequenced using Next-Generation Sequencing techniques, and such analysis is based on the existing knowledge about the genes, pathways and phenotypes. This method is capable of generating quantitative scores at the levels of gene, pathway and phenotype, which represent the degree of functional disorder of the corresponding gene or pathway, and the level of contribution to development of a specific phenotype of the genomic variant. The validation experiments performed with exomes of patients with “Congenital Disorder of Glycosylation, Type IA” (CDG1A) have shown positive results.

Ginés Almagro-Hernández, Francisco García-Sánchez, María Eugenia de la Morena-Barrio, Javier Corral, Jesualdo Tomás Fernández-Breis
Transcriptome-Based Identification of a Seed Olive Legumin (11S Globulin). Characterization of Subunits, 3D Modelling and Molecular Assessment of Allergenicity

Seed storage proteins (SSPs) are fundamental molecules for seed germination as an important source of carbon and nitrogen. Among the main four protein families that integrate SSPs, legumins (11S globulins) are widely distributed in dicots, and represent the major contribution to the pool of seed proteins in olive. In the present study, we have used an olive seed transcriptome generated de novo by 454/Roche Titanium+ sequencing to identify a broad panel of 11S protein sequences. Among these identified legumin sequences, five were selected using their presence within the output results from the BLASTP alongside the whole NCBI database, and their clustering with previously-characterized 11S sequences in the phylogenetic analysis as the criteria. The selected sequences were identified as corresponding to the isoform 2 of the 11S protein precursor, and one of the sequences was used for further analysis. Individual acidic and basic subunits within this sequence were recognised, 3D-modeled and assessed as regard to their potential molecular interaction by docking methods. Furthermore, T-cell epitopes were forecast by using predictive software in order to evaluate the putative implications of the olive 11S proteins in food allergy. The potential use of this protein highly present in the olive seed as a food source is discussed.

Adoración Zafra, José Carlos Jimenez-Lopez, Rosario Carmona, Gonzalo Claros, Juan de Dios Alché
A Feature Selection Scheme for Accurate Identification of Alzheimer’s Disease

Effective biomarkers play important roles for accurate diagnosis of Alzheimer’s Disease (AD), including its intermediate stage (i.e. mild cognitive impairment, MCI). In this paper, a new feature selection scheme was proposed to improve the identification AD and MCI from healthy controls (HC) by a support vector machine (SVM) based-classifier with recursive feature addition. Our method can find the significant features automatically, and the experiments in this work demonstrates that our scheme can achieve better classification performance based on a dataset with 103 subjects where three biomarkers, i.e., structural MR imaging (MRI), functional imaging PET, and cerebrospinal fluid(CSF), had been used. Our proposed method demonstrated its effectiveness in identifying AD from HC with an accuracy of 95.0 %, while only 89.3 % for the classifier without the step of feature selection. In addition, some features selected in this work had shown strong relation with AD by other previous studies, which can provide the support for the significance of our results.

Hao Shen, Wen Zhang, Peng Chen, Jun Zhang, Aiqin Fang, Bing Wang
HYDROWEB, an Online Tool for the Calculation of Hydrodynamic Properties of Macromolecules

Calculation and prediction of hydrodynamic properties of biological and synthetic macromolecules through computational approaches is a technique that has experimented a great advance in the last decades. However, most of the hydrodynamics software was designed decades ago and it is rather complex to use for less computer experienced users. With this objective in mind we have developed HYDROWEB, a tool that easily allows to work with hydrodynamic models of macromolecules and the calculation of their properties (using the softwares HYDROPRO, HYDRO++ and SIMUFLEX) and convenient visualization of its results. The tool can be accessed at http://bio-hpc.eu/software/hydroweb/.

Horacio Pérez-Sánchez, Jorge Peña-García, Helena den-Haan, Ricardo Rodríguez-Schmidt, José P. Cerón-Carrasco, Adriano N. Raposo, Mounira Bouarkat, Sid Ahmed Sabeur, Francisco Guillermo Díaz-Baños
The Use of the Miyazawa-Jernigan Residue Contact Potential in Analyses of Molecular Interaction and Recognition with Complementary Peptides

The classic results by Biro, Blalock and Root-Bernstein link genetic code nucleotide patterns to amino acid properties, protein structure and interaction. This study explores the use of the Miyazawa-Jernigan residue contact potential in analyses of protein interaction and recognition between sense and complementary (antisense) peptides. We show that Miyazawa-Jernigan residue contact energies, derived from 3D data, define the recognition rules of peptide-peptide interaction based on the complementary coding of DNA and RNA sequences. The model is strongly correlated with several other chemoinformatic scales often used for the determination of protein antigenic sites and transmembrane regions (Parker et al. r = 0.94; Rose et al. r = −0.92; Manavalan-Ponnuswamy r = −0.92; Cornette et al. r = −0.91; Kolaskar-Tongaonkar r = −0.91; Grantham r = 0.90; White-Wimley (octanol) r = −0.88; Kyte-Doolittle r = −0.85). The algorithms presented have important biomedical and proteomic applications related to modulation of the peptide-receptor function and epitope-paratope interaction, the design of lead compounds and the development of new immunochemical assays and diagnostic procedures.

Nikola Štambuk, Paško Konjevoda, Zoran Manojlović, Renata Novak Kujundžić
Comparative Analysis of microRNA-Target Gene Interaction Prediction Algorithms - The Attempt to Compare the Results of Three Algorithms

MicroRNAs are non-coding, small molecules (21–25 nucleotides). They regulate gene expression by downregulation of the target gene or translational repression. What is more, they are involved in cancer growth. Nowadays, we can observe a sustainable growth and development of computational target prediction programs. The plethora of prediction algorithms cause the problem with a choice of the one algorithm, that may give satisfactory results– the possible binding site of the microRNA to the target gene. What is crucial is that the result is considered as satisfactory one, when it is statistically significant and there is a high probability that a specific gene is the target of real microRNA. In order to compare the results obtained from different algorithms we have to define one probability space for each of them. We performed a proper statistical test (Fisher’s exact test) to ensure that we can juxtapose the results from three different algorithms which take into account different aspects of binding microRNA to the target gene. The conclusion of our work is the suggestion of the way in which one can juxtapose the results from algorithms based on different methods of prediction the possible miRNA-target gene interactions.

Anna Krawczyk, Joanna Polańska
A Novel Divisive iK-Means Algorithm with Region-Driven Feature Selection as a Tool for Automated Detection of Tumour Heterogeneity in MALDI IMS Experiments

Due to the constantly increasing cancer incidence rates and varying levels of effectiveness of the utilised therapeutic approaches, obtaining a clear understanding of the underlying phenomena is of the utmost importance. The problem is tackled by numerous research groups worldwide, utilising a number of molecular biology quantification techniques. MALDI-IMS (Matrix-Assisted Laser Desorption Ionization – Image Mass Spectrometry) is a quantification technique that brings together MALDI spectroscopy with tissue imaging by multiple applications of the laser beam to a raster of points on the surface of the analysed tissue. The application of MALDI-IMS in cancer research allows for the spatial identification of molecular profiles and their heterogeneity within the tumour, but leads to the creation of highly complicated datasets of great volume. Extraction of relevant information from such datasets relies on the design of appropriate algorithms and using them as the base to construct efficient data mining tools. Existing computational tools for MALDI-IMS exhibit numerous shortcomings and limited utility and cannot be used for fully automated discovery of heterogeneity in tumour samples. We developed a novel signal analysis pipeline including signal pre-processing, spectrum modelling and intelligent spectra clustering with region-driven feature selection to efficiently analyse that data. The idea of combining divisive iK-means algorithm with peptide abundance variance based dimension reduction performed independently for each analysed sub-region allowed for discovery of squamous cell carcinoma and keratinized stratified squamous epithelium together with stratified squamous epithelium within an exemplary head and neck tumour tissue.

Grzegorz Mrukwa, Grzegorz Drazek, Monika Pietrowska, Piotr Widlak, Joanna Polanska
Multigene P-value Integration Based on SNPs Investigation for Seeking Radiosensitivity Signatures

Dysregulation of apoptosis is a key attribute of cancer, especially the one induced by p53 expression disruption. Radiotherapy, sometimes supported by chemotherapy and/or pre-surgery, is recommended in majority of cases, but despite of the very well defined treatment protocols and high quality irradiation procedure, the huge dispersion in response to the radiotherapy is observed among cancer patients. Patient radiosensitivity, according to up-to-date knowledge, is at least partially responsible for different reactions to ionising radiation. Here we concentrate on investigation of single nucleotide polymorphisms (SNP) which can possibly explain the radiation response phenomena. To reach this goal dependent and independent methods of p-value integrations are presented and compared. Both statistical and molecular function domains are used in comparison study. We propose a novel method of p-value integration which includes the control of gene expression trend and introduces the adaptive significance level. What is more the multigene approach is proposed in contrary to classical single gene investigation. As a result, set of statistically significant polymorphisms was obtained, among which some were identified as possible deleterious for KRAS signalling pathway.

Joanna Zyla, Christophe Badie, Ghazi Alsbeih, Joanna Polanska
Epithelial-Mesenchymal Transition Regulatory Network-Based Feature Selection in Lung Cancer Prognosis Prediction

Feature selection technique is often applied in identifying cancer prognosis biomarkers. However, many feature selection methods are prone to over-fitting or poor biological interpretation when applied on biological high-dimensional data. Network-based feature selection and data integration approaches are proposed to identify more robust biomarkers. We conducted experiments to investigate the advantages of the two approaches using epithelial mesenchymal transition regulatory network, which is demonstrated as highly relevant to cancer prognosis. We obtained data from The Cancer Genome Atlas. Prognosis prediction was made using Support Vector Machine. Under our experimental settings, the results showed that network-based features gave significantly more accurate predictions than individual molecular features, and features selected from integrated data (RNA-Seq and micro-RNA data) gave significantly more accurate predictions than features selected from single source data (RNA-Seq data). Our study indicated that biological network-based feature transformation and data integration are two useful approaches to identify robust cancer biomarkers.

Borong Shao, Tim Conrad

Biomedical Image Analysis

Frontmatter
Tracking a Real Liver Using a Virtual Liver and an Experimental Evaluation with Kinect v2

In this study, we propose a smart transcription algorithm for translation and/or rotation motions. This algorithm has two phases: calculating the differences between real and virtual 2D depth images, and searching the motion space defined by three translation and three rotation degrees of freedom based on the depth differences. One depth image is captured for a real liver using a Kinect v2 depth camera and another depth image is obtained for a virtual liver (a polyhedron in stereo-lithography (STL) format by z-buffering with a graphics processing unit). The STL data are converted from Digital Imaging and Communication in Medicine (DICOM) data, where the DICOM data are captured from a patient’s liver using magnetic resonance imaging and/or a computed tomography scanner. In this study, we evaluated the motion precision of our proposed algorithm based on several experiments based using a Kinect v2 depth camera.

Hiroshi Noborio, Kaoru Watanabe, Masahiro Yagi, Yasuhiro Ida, Shigeki Nankaku, Katsuhiko Onishi, Masanao Koeda, Masanori Kon, Kosuke Matsui, Masaki Kaibori
Thermal Imaging-Based Muscular Activity in the Biomechanical Study of Surgeons

The use of minimally invasive surgery has introduced many modifications in surgical procedures. Despite the advantages that this kind of surgery provides, surgeons have to confront many ergonomic problems during their interventions. In fact, the poor ergonomic characteristics of the workplace reduce the efficiency of the interventions and produce undesirable effects such as physical fatigue or musculoskeletal injuries. Electromyography has been used traditionally for measurement of the muscular effort in the workplace. However, in recent studies thermal imaging has been highlighted as a valuable alternative in the determination of muscular activity. One of the main advantages of using thermal imaging is that there is no necessary to foresee the muscular groups activated in the performance of surgery. In this paper thermal imaging is used to evaluate the muscular effort of surgeons and the results are compared with electromyography. The paper shows the features of this technique and the relationship with electromyography.

Ramon Sancibrian, Maria C. Gutierrez-Diez, Carlos Redondo-Figuero, Esther G. Sarabia, Maria A. Benito-Gonzalez, Jose C. Manuel-Palazuelos
FCM-Based Method for MRI Segmentation of Anatomical Structure

Fuzzy C-means (FCM) has been widely applied to segmentation of medical images, especially MRI images for identifying living organs and supporting medical diagnosis. However, in practice, this method is too sensitive to image noises. Then, many methods have been proposed to improve the objective function of FCM by adding a penalty term to it. One drawback of these methods is that they can determine neither the appropriate size of observation window for each pixel of interest for incorporating spatial information, nor the suitable importance coefficient of the penalty term. Moreover, the modification of the objective function of FCM often causes additional complex derivations. In this paper, we develop a new FCM-based method for medical MRI image segmentation. This method permits to dynamically determine the optimal size of observation window for each pixel of interest without adding any penalty term. Moreover, a n-dimensional feature vector including both local and global spatial information between neighboring pixels is generated to describe each pixel in the objective function. And specialized a priori knowledge is integrated into the segmentation procedure in order to control the application of FCM for tissue classification of thigh. The effectiveness and the robustness of the proposed method have been validated by real MRI image of thigh.

Pinti Antonio
An Automated Tensorial Classification Procedure for Left Ventricular Hypertrophic Cardiomyopathy

Cardiovascular diseases are the leading cause of death globally. Therefore, classification tools play a major role in prevention and treatment of these diseases. Statistical learning theory applied to magnetic resonance imaging has led to the diagnosis of a variety of cardiomyopathies states. We propose a two-stage classification scheme capable of distinguishing between heterogeneous groups of hypertrophic cardiomyopathies and healthy patients. A multimodal processing pipeline is employed to estimate robust tensorial descriptors of myocardial mechanical properties for both short-axis and long-axis magnetic resonance tagged images using the least absolute deviation method. A homomorphic filtering procedure is used to align the cine segmentations to the tagged sequence and provides 3D tensor information in meaningful areas. Results have shown that the proposed pipeline provides tensorial measurements on which classifiers for the study of hypertrophic cardiomyopathies can be built with acceptable performance even for reduced samples sets.

Santiago Sanz-Estébanez, Javier Royuela-del-Val, Susana  Merino-Caviedes, Ana Revilla-Orodea, Teresa Sevilla, Lucilio  Cordero-Grande, Marcos Martín-Fernández, Carlos Alberola-López
Depth Image Matching Algorithm for Deforming and Cutting a Virtual Liver via Its Real Liver Image Captured Using Kinect v2

In this paper, we propose a smart deforming and/or cutting transcription algorithm for rheology objects such as human livers. Moreover, evaluation of performance and shape precision under the proposed algorithm are experimentally verified by deforming a real clay liver and/or cutting a gel block prepared at human body temperature. First, we capture the image of the liver of a patient by digital imaging and communication in medicine (DICOM) generated by magnetic resonance imaging (MRI) and/or computed tomography (CT) scanner. Then, the DICOM data is segmented and converted into four types of stereo-lithography (STL) polyhedra, which correspond to the whole liver and three blood vessels. Second, we easily overlap the virtual and real liver images in our mixed reality (MR) surgical navigation system using our initial position/orientation/shape adjustment system that uses color images to differentiate between real and virtual depth images. After overlapping, as long as the real liver is deformed and/or cut by a human (doctor), the liver is constantly captured by Kinect v2. Subsequently, by using the real depth image captured in real time, many vertices around the virtual polyhedral liver in STL format are pushed/pulled by viscoelastic elements called the Kelvin–Voigt materials located on the vertices. Finally, after determining the displacements of the vertices, we obtain an adequately shaped STL. The vertex position required for fixing the shape is calculated using the Runge–Kutta method.

Hiroshi Noborio, Kaoru Watanabe, Masahiro Yagi, Kentaro Takamoto, Shigeki Nankaku, Katsuhiko Onishi, Masanao Koeda, Masanori Kon, Kosuke Matsui, Masaki Kaibori
Optic Disc Segmentation with Kapur-ScPSO Based Cascade Multithresholding

The detection of significant retinal regions (segmentation) constitutes an indispensible need for computer aided diagnosis of retinal based diseases. At this point, image segmentation algorithm is wanted to be quick in order to spare time for feature selection and classification parts. In this paper, we deal with the fast and accurate segmentation process of optic discs in retinal images. For this purpose, a cascade multithresholding (CMT) process is proposed by a novel optimization algorithm (Scout Particle Swarm Optimization) and an efficient cost function (Kapur). Scout Particle Swarm Optimization (ScPSO) is originated from Particle Swarm Optimization (PSO) and improves standard PSO by using a necessary part taken from Artificial Bee Colony (ABC) Optimization. In other words, the most important handicap of PSO (regeneration of useless particles) is eliminated via the formation of ScPSO that can be obtained by adding the scout bee phase from ABC into standard PSO. In this study, this novel method (ScPSO) constitutes the optimization part of multithresholding process. Kapur function is preferred as being the cost function to be used in ScPSO, since Kapur provides low standard deviations on output of optimization based multithresholding techniques in literature. In this manner, a well-combined structure (Kapur-ScPSO) is generated for cascade multithresholding. Optic disc images taken from DRIVE database are used for statistical and visual comparison. As a result, Kapur-ScPSO based CMT can define the optic disc quickly (7–8 s) with the rates of 77.08 % precision, 57.89 % overlap and 95.59 % accuracy.

Hasan Koyuncu, Rahime Ceylan

Biomedical Signal Analysis

Frontmatter
Uncertainty in 1D and 3D Models of a Fiber Stimulated by an External Electrode

One-dimensional (1D) cable model is used to study electrical excitation of nerves and muscle fibers, and to aid in the design of electrical therapies. However, approximations inherent in the cable model limit its validity. More realistic three-dimensional (3D) fiber models have been advocated but they require long computational times. This study investigates whether better accuracy of 3D models is worth the cost by computing the probability p that the difference between outputs from 3D and 1D models could have arisen from uncertainties in parameter values. The results are summarized in contour maps of probability p in the space of fiber-electrode distances and stimulus durations. The cable model is considered valid where $$p>0.05$$. This region of validity depends on uncertainties in the parameters. In particular, the uncertainties must exceed 0.05 (0.02) of the nominal parameter values for the cable model to be valid in the regions where retinal (cochlear) implants operate.

Wanda Krassowska Neu
A Comparison of Feature Ranking and Rank Aggregation Techniques in Automatic Sleep Stage Classification Based on Polysomnographic Signals

Sleep quality is one of the most important measures of healthy life, especially considering the huge number of sleep-related disorders. Identifying sleep stages using multi-channel recordings like polysomnographic (PSG) signals is an effective way of assessing sleep quality. However, manual sleep stage classification is time-consuming, tedious and highly subjective. To overcome this, automatic sleep classification was proposed, in which pre-processing, feature extraction and classification are the three main steps. Since the classification accuracy is deeply affected by the features selection, in this paper several feature selection methods as well as rank aggregation methods are compared. Feature selection methods are evaluated by three criteria: accuracy, stability and similarity. For classification two different classifiers (k-nearest neighbor and multilayer feedforward neural network) were utilized. Simulation results show that MRMR-MID achieves highest classification performance while Fisher method provides the most stable rankings.

Shirin Najdi, Ali Abdollahi Gharbali, José Manuel Fonseca
Improved Dynamic Time Warping for Abnormality Detection in ECG Time Series

Abnormality detection in ECG time series is very important for cardiologists to detect automatically heart diseases. In this study, we propose a novel algorithm that compare and align efficiently quasi periodic time series. We apply this algorithm to detect exactly in the ECG, where the anomaly is. For this purpose, we use a normal (healthy) ECG segment and we compare it with another ECG segment. Our algorithm is an improvement of the famous dynamic time warping algorithm, called Improved Dynamic Time Warping (I-DTW). Indeed, the alignment of quasi-periodic time series, such as those representing the ECG signal is impossible to achieve with the DTW, especially when the segment of ECGs are of different lengths and composed of different number of periods each. The tests were performed on ECG time series, selected from the public database of the “Massachusetts Institute of Technology - Beth Israel Hospital (MIT-BIH)”. The results show that the proposed method outperforms the famous DTW method in terms of alignment accuracy and that it can be a good method for abnormalities detection in ECGs time series.

Imen Boulnemour, Bachir Boucheham, Slimane Benloucif
Hardware Accelerator to Compute the Minimum Embedding Dimension of ECG Records

In this paper, a parallel hardware implementation to accelerate the computation of the minimum embedding dimension is presented. The estimation of the minimum embedding dimension is a time-consuming task necessary to start the non-linear analysis of biomedical signals. The design presented has as main goals maximum performance and reconfigurability. The design process is explained, giving details on the decisions taken to achieve massive parallelization, as well as the methodology used to reduce hardware usage while keeping a high mathematical accuracy. The results yield that hardware acceleration achieves a speedup of three orders of magnitude in comparison to a purely software approach.

Pablo Pérez-Tirador, Gabriel Caffarena, Constantino A. García, Abraham Otero, Rafael Raya, Rodrigo Garcia-Carmona
Low-Power, Low-Latency Hermite Polynomial Characterization of Heartbeats Using a Field-Programmable Gate Array

The characterization of the heartbeat is one of the first and most important steps in the processing of the electrocardiogram (ECG) given that the results of the subsequent analysis depend on the outcome of this step. This characterization is computationally intensive, and both off-line and on-line (real-time) solutions to this problem are of great interest. Typically, one uses either multi-core processors or graphics processing units which can use a large number of parallel threads to reduce the computational time needed for the task. In this paper, we consider an alternative approach, based on the use of a dedicated hardware implementation (using a field-programmable gate-array (FPGA)) to solve a critical component of this problem, namely, the best-fit Hermite approximation of a heartbeat. The resulting hardware implementation is characterized using an off-the-shelf FPGA card. The single beat best-fit computation latency when using six Hermite basis polynomials is under $$0.5\,ms$$ with a power dissipation of 3.1 W, demonstrating the possibility of true real-time characterization of heartbeats for online patient monitoring.

Kartik Lakhotia, Gabriel Caffarena, Alberto Gil, David G. Márquez, Abraham Otero, Madhav P. Desai
Assessing Parallel Heterogeneous Computer Architectures for Multiobjective Feature Selection on EEG Classification

High-dimensional multi-objective optimization will open promising approaches to many applications on bioinformatics once efficient parallel procedures are available. These procedures have to take advantage of the present heterogeneous architectures comprising multicore CPUs and GPUs. In this paper, we describe and analyze several OpenCL implementations for an application comprising multiobjective feature selection for clustering in an EEG classification task on high-dimensional patterns. These implementation alternatives correspond to different uses of multicore CPU and GPU platforms to process irregular data codes. Depending on the dataset used, we have reached speedups of up to 14.9 and 17.2 with up to 24 threads for the implemented OpenCL CPU kernels and of up to 7.1 and 9.1 with up to 13 SMX processors and 256 local work-items for our OpenCL GPU kernels. Nevertheless, to provide this level of performance, careful considerations about the use of the memory hierarchy of the heterogeneous architecture and different strategies to cope with the irregularity of our target application have to be taken into account.

Juan José Escobar, Julio Ortega, Jesús González, Miguel Damas

Computational Systems for Modelling Biological Processes

Frontmatter
Prediction of Proinflammatory Potentials of Engine Exhausts by Integrating Chemical and Biological Features

The increasing prevalence of immune-related diseases has raised concerns about immunotoxicity of engine exhausts. The evaluation of immunotoxicity associated with engine exhausts has relied on expensive and time-consuming experiments. In this study, a computational method named CBM was developed for predicting proinflammatory potentials of engine exhausts using chemical and biological data which are routinely analyzed for toxicity evaluation. The CBM model, based on a principal component regression algorithm, performs well with high correlation coefficient values of 0.972 and 0.849 obtained from training and independent test sets, respectively. In contrast, chemical or biological features alone showed poor correlation with the toxicity. The model indicates the importance of the utilization of both chemical and biological features for developing an effective model. The proposed method could be further developed and applied to predict bioactivities of mixtures.

Chia-Chi Wang, Ying-Chi Lin, Yuan-Chung Lin, Syu-Ruei Jhang, Chun-Wei Tung
Calculating Elementary Flux Modes with Variable Neighbourhood Search

In this work, we calculate Elementary Flux Modes (EFMs) from metabolic networks using a trajectory-based metaheuristic, Variable Neighbourhood Search (VNS). This method is based on the local exploration around an incumbent solution and the subsequent visits to “neighbourhoods” (i.e., other areas of the search space) when the exploration is not successful on improving an objective function. This strategy ensures a suitable balance between exploration and exploitation, which is the key point in metaheuristic-based optimization. Making use of linear programming and the Simplex method, a VNS-based metaheuristic has been designed and implemented. This algorithm iteratively solves the linear programs resulting from the formulation of different hypotheses about the metabolic network. These solutions are, when feasible, EFMs. The application of the proposed method on a benchmark problem corroborates its efficacy.

Jose A. Egea, José M. García
Using Nets-Within-Nets for Modeling Differentiating Cells in the Epigenetic Landscape

In this work the authors propose the use of a high-level Petri net formalism for modeling developmental processes at the cell level, taking explicitly into account the role of epigenetic regulation.

Roberta Bardini, Alfredo Benso, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino
Simulations of Cardiac Electrophysiology Combining GPU and Adaptive Mesh Refinement Algorithms

Computer models have become valuable tools for the study and comprehension of the complex phenomena of cardiac electrophysiology. However, the high complexity of the biophysical processes translates into complex mathematical and computational models. In this paper we evaluate a hybrid multicore and graphics processing unit numerical algorithm based on mesh adaptivity and on the finite volume method to cope with the complexity and to accelerate these simulations. This is a very attractive approach since the electrical wavefront corresponds to only a small fraction of the cardiac tissue. Usually, the numerical solution of the partial differential equations that model the phenomenon requires very fine spatial discretization to follow the wavefront, which is approximately 0.2 mm. The use of uniform meshes leads to high computational cost as it requires a large number of mesh points. In this sense, the tests reported in this work show that simulations of three-dimensional models of cardiac tissue have been accelerated by more than 626 times using the adaptive mesh algorithm together with its parallelization, with no significant loss in accuracy.

Rafael S. Oliveira, Bernardo M. Rocha, Denise Burgarelli, Wagner  Meira  Jr., Rodrigo W. dos Santos
A Plasma Flow Model in the Interstitial Tissue Due to Bacterial Infection

Diseases due to infections might lead to death. Fever is often the first sign of an infection; other signs are skin hot to touch, shivering, aching muscles, pain, redness, swelling and so on, depending on the kind of infection. This study is a first attempt to model one of the infection symptoms, the edema. Briefly, edema may be caused by increased blood vessel wall permeability which lead to a swollen, red area. Neutrophil-bacteria iterations trigger a chain of cytokine reactions which in turn change the vessel wall permeability leading to an increase of interstitial fluid pressure. All the iterations are modeled using a n-phase partial differential equation system based on porous media assumptions. Model solutions are obtained using finite-volume method and the upwind scheme. Finally, the numerical results are qualitatively compared with experimental data available from the literature, presenting a good agreement.

Ruy Freitas Reis, Rodrigo Weber dos Santos, Marcelo Lobosco
Reactive Interstitial and Reparative Fibrosis as Substrates for Cardiac Ectopic Pacemakers and Reentries

Dangerous cardiac arrhythmias have been frequently associated with focal sources of fast pulses, i.e. ectopic pacemakers. However, there is a lack of experimental evidences that could explain how ectopic pacemakers could be formed in cardiac tissue. In recent studies, we have proposed a new theory for the genesis of ectopic pacemakers in pathological cardiac tissues: reentry inside microfibrosis, i.e., a small region where excitable myocytes and non-conductive material coexist. In this work, we continue this investigation by comparing different types of fibrosis, reparative and reactive interstitial fibrosis. We use detailed and modern models of cardiac electrophysiology that account for the micro-structure of cardiac tissue. In addition, for the solution of our models we use, for the first time, a new numerical algorithm based on the Uniformization method. Our simulation results suggest that both types of fibrosis can support reentries, and therefore can generate in-silico ectopic pacemakers. However, the probability of reentries differs quantitatively for the different types of fibrosis. In addition, the new Uniformization method yields 20-fold increase in cardiac tissue simulation speed and, therefore, was an essential technique that allowed the execution of over a thousand of simulations.

Rafael Sachetto Oliveira, Bruno Gouvêa de Barros, Johnny Moreira Gomes, Marcelo Lobosco, Sergio Alonso, Markus Bär, Rodrigo Weber dos Santos
Miyazawa-Jernigan Contact Potentials and Carter-Wolfenden Vapor-to-Cyclohexane and Water-to-Cyclohexane Scales as Parameters for Calculating Amino Acid Pair Distances

The difference between amino acid chemical properties that correlate to the exchangeability of protein sequence residues is often analysed using approach proposed by Grantham (1974). His difference formula, i.e., matrix, for calculating the distances between amino acid pairs of the protein consists of three essential amino acid physicochemical properties – composition, polarity and volume, that are significantly correlated to the substitution frequencies of the protein residues. Miyata et al. (1979) re-evaluated this concept, and showed that the degree of amino acid difference is just as adequately explained by only two physicochemical factors, volume and polarity. Miyazawa-Jernigan relative partition/hydrophobic energies (ε = Δe ir ), and Carter-Wolfenden vapor-to-cyclohexane scale (Gv>c = ΔGv>c) are two alternative amino acid physicochemical parameters that are strongly correlated to their polarity and volume/mass, respectively. We show that the Miyazawa-Jernigan residue contact potential could be used instead of the Grantham polarity and composition parameters to derive an updated Miyata matrix. This substitution permits Miyata matrix correction for the amino acid parameters of: contact energies, repulsive packing energies, secondary structure energies, and Grantham’s composition property. Distance values calculated between both (classic and updated) Miyata matrices exhibit a strong correlation of r = 0.91. The possibility of analyzing residue distances based on Carter-Wolfenden water-to-cyclohexane (w > c) and vapor-to-cyclohexane (v > c) scales instead of the amino acid polarity and volume parameters is also discussed, and a new distance matrix is derived.

Nikola Štambuk, Paško Konjevoda, Zoran Manojlović

eHealth

Frontmatter
Inter-observer Reliability and Agreement Study on Early Diagnosis of Diabetic Retinopathy and Diabetic Macular Edema Risk

The degree of inter-observer agreement on early diagnosis of diabetic retinopathy (DR) and diabetic macular edema (DME) risk has been assessed in this paper. Three sets of DR and DME risk ratings on 529 diabetic patients were independently built by ophthalmologists of the Andalusian (Spain) Health Service through observation of two macula-centered retinographies from these patients (one image per eye, 1058 images). DR was graded on a 0–3 scale from DR-unrelated to severe DR, while DME risk was graded on a 0–2 scale from no risk to moderate-severe risk. Inter-rater reliability (IRR) assessment was performed by the intra-class correlation (ICC) and two kappa-like statistical variants —Light’s kappa and Fleiss’ kappa. ICC-computed IRR showed excellent agreement between our three coders: values were 0.844 (95 % CI, 0.822–0.865) and 0.833 (95 % CI, 0.805–0.853) for DR and DME ratings, respectively. Kappa index-quantified assessment resulted in substantial agreement, as both kappa indexes rendered values around 0.60 for DR and 0.75 for DME ratings. All computed IRR metrics proved high inter-observer agreement and consistency among DR degree and DME risk diagnoses. Reliable diagnosis provided by human experts supports the generation of reference standards that can be used in the development of automatic DR diagnosis systems.

Manuel Emilio Gegundez-Arias, Carlos Ortega, Javier Garrido, Beatriz Ponte, Fatima Alvarez, Diego Marin
Automated Detection of Diabetic Macular Edema Risk in Fundus Images

This paper is aimed at assessing the initial performance of a computer-based system to detect the risk of diabetic macular edema (DME). The development of this tool was funded by the Health Ministry of the Andalusian Regional Government (Spain) with the purpose of being integrated into a complete system for early diagnosis of diabetic retinopathy (DR). The algorithmic methods are based on the detection of retinal exudates (early ophthalmic signs of DME) by fundus image processing. It has been tested on a set of 1058 macula-centred retinographies from people with diabetes at risk for retinal diseases. Each of the images was rated on a 0–2 scale (from no DME risk to moderate-severe risk) created from the observations of ophthalmologic specialists of three Andalusian Health Service Medical Centres. Since these three sets of DME expert ratings showed a high agreement and consistency, a consensus diagnosis was built and used as a ground truth. System evaluation was carried out by measuring the sensitivity and specificity of automated DME risk detection regarding this clinical reference diagnosis. In addition, system failures in real cases of DME risk (false negatives) and its clinical importance were also measured. The system showed several promising operation points, being able to work at a sensitivity level comparable to human experts, with no clinically-important failures, and enough specificity from a hypothetical practical implementation point of view. Thus, it demonstrated 0.9039 sensitivity per image (against 0.7948, 0.9345 and 0.8690 of specialists), with all false negatives graded as mild DME risk, and 0.7696 specificity. This last value indicates that over 75 % of the images with no apparent DME risk under consideration are correctly identified by the system.Initial performance assessment shows that the presented system for the detection of DME risk is a suitable tool to be integrated into a complete DR pre-screening tool for the automated management of patients within a screening programme. Progress in this integration is definitely associated with the need to carry out a comprehensive system evaluation.

Diego Marin, Manuel Emilio Gegundez-Arias, Carlos Ortega, Javier Garrido, Beatriz Ponte, Fatima Alvarez
Use of Mobile Application for Nutrition Health Education

People haven’t enough activity and bad eating habits in a busy life. And, people aren’t understanding nutrition health knowledge for foods and drink. People suffering from obesity and chronic disease risk are relatively high. People how to learn nutrition health knowledge are very important for reducing those risks. But, the applying of nutrition health education general is speech communication by nurse at the hospital. That learning situated is short time. People cannot enough understanding information. So, this paper proposed the use of mobile application to perform nutrition health education. When people or patients need health knowledge, they can use application system on the mobile device to get relation information anytime anywhere. Because the contents of nutrition health education are very large and complex, it’s impossible what people or patients want understanding at a short time. Therefore, this application can to apply people focusing on self-requirement to get relation information.

Hsiao-Hui Li, Mei-Hua Luo, Yuan-Hsun Liao

Tools for Next Generation Sequencing Data Analysis

Frontmatter
Automatic Workflow for the Identification of Constitutively-Expressed Genes Based on Mapped NGS Reads

Expression analyses such as quantitative and/or real-time PCR require the use of reference genes for normalization in order to obtain reliable assessments. The expression levels of these reference genes must remain constant in all different experimental conditions and/or tissues under study. Traditionally, housekeeping genes have been used for this purpose, but most of them have been reported to vary their expression levels under some experimental conditions. Consequently, the election of the best reference genes should be tested and validated in every experimental scenario. Microarray data are not always available for the search of appropriate reference genes, but NGS experiments are increasingly common. For this reason, an automatic workflow based on mapped NGS reads is presented with the aim of obtaining putative reference genes for a giving species in the experimental conditions of interest. The calculation of the coefficient of variation (CV) and a simple, normalized expression value such as RPKM per transcript allows for filtering and selecting those transcripts expressed homogeneously and consistently in all analyzed conditions. This workflow has been tested with Roche/454 reads obtained from olive (Olea europaea L.) pollen and pistil at different developmental stages, as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana. Some of the putative candidate reference genes have been experimentally validated.

Rosario Carmona, Pedro Seoane, Adoración Zafra, María José Jiménez-Quesada, Juan de Dios Alché, M. Gonzalo Claros
Influence of Normalization on the Analysis of Electroanatomical Maps with Manifold Harmonics

Electrical and anatomical maps (EAM) are built by cardiac navigation systems (CNS) and by Electrocardiographic Imaging systems for supporting arrhythmia ablation during electrophysiological procedures. Manifold Harmonics Analysis (MHA) has been proposed for analyzing the spectral properties of EAM of voltages and times in CNS by using a representation of the EAM supported by the anatomical mesh. MHA decomposes the EAM in a set of basis functions and coefficients which allow to conveniently reconstruct the EAM. In this work, we addressed the effect of normalization of the mesh spatial coordinates and the bioelectrical feature on the EAM decomposition for identifying regions with strong variation on the feature. For this purpose, a simulated EAM with three foci in a ventricular and in an atrial tachycardia was used. These foci were located at different distances amongst themselves, and different voltages were also considered. Our experiments show that it is possible to identify the foci origin by considering the first 3–5 projections only when normalization was considered, both for atrial and ventricular EAM. In this case, better quality in the EAM reconstruction was also obtained when using less basis functions. Hence, we conclude that normalization can help to identify regions with strong feature variation in the first stages of the EAM reconstruction.

Margarita Sanromán-Junquera, Inmaculada Mora-Jiménez, Arcadio García-Alberola, Antonio Caamaño-Fernández, José Luis Rojo-Álvarez
AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing

To speed up the alignment of DNA reads or assembled contigs against a protein database has been a challenge up to now. The recent tool DIAMOND has significantly improved the speed of BLASTX and RAPSearch, while giving similar degree of sensitivity. Yet for applications like metagenomics, where large amount of data is involved, DIAMOND still takes a lot of time. This paper introduces an even faster protein alignment tool, called AC-DIAMOND, which attempts to speed up DIAMOND via better SIMD parallelization and more space-efficient indexing of the reference database; the latter allows more queries to be loaded into the memory and processed together. Experimental results show that AC-DIAMOND is about 4 times faster than DIAMOND on aligning DNA reads or contigs, while retaining the same sensitivity as DIAMOND.For example, the latest assembly of the Iowa praire soil metagenomic dataset generates over 9 milllion of contigs, with a total size about 7 Gbp; when aligning these contigs to the protein database NCBI-nr, DIAMOND takes 4 to 5 days, and AC-DIAMOND takes about 1 day. AC-DIAMOND is available for testing at http://ac-diamond.sourceforge.net.

Huijun Mai, Dinghua Li, Yifan Zhang, Henry Chi-Ming Leung, Ruibang Luo, Hing-Fung Ting, Tak-Wah Lam
Prioritization of Schizophrenia Risk Genes by a Network-Regularized Logistic Regression Method

Schizophrenia (SCZ) is a severe mental disorder with a large genetic component. While recent large-scale microarray- and sequencing-based genome wide association studies have made significant progress toward finding SCZ risk variants and genes of subtle effect, the interactions among them were not considered in those studies. Using a protein-protein interaction network both in our regression model and to generate a SCZ gene subnetwork, we developed an analytical framework with Logit-Lapnet, the graphical Laplacian-regularized logistic regression, for whole exome sequencing (WES) data analysis to detect SCZ gene subnetworks. Using simulated data from sequencing-based association study, we compared the performances of Logit-Lapnet with other logistic regression (LR)-based models. We use Logit-Lapnet to prioritize genes according to their coefficients and select top-ranked genes as seeds to generate the gene sub-network that is associated to SCZ. The comparison demonstrated not only the applicability but also better performance of Logit-Lapnet to score disease risk genes using sequencing-based association data. We applied our method to SCZ whole exome sequencing data and selected top-ranked risk genes, the majority of which are either known SCZ genes or genes potentially associated with SCZ. We then used the seed genes to construct SCZ gene subnetworks. This result demonstrates that by ranking gene according to their disease contributions our method scores and thus prioritizes disease risk genes for further investigation. An implementation of our approach in MATLAB is freely available for download at: http://zdzlab.einstein.yu.edu/1/publications/LapNet-MATLAB.zip.

Wen Zhang, Jhin-Rong Lin, Rubén Nogales-Cadenas, Quanwei Zhang, Ying Cai, Zhengdong D. Zhang
GNATY: Optimized NGS Variant Calling and Coverage Analysis

Next generation sequencing produces an ever increasing amount of data, requiring increasingly fast computing infrastructures to keep up. We present GNATY, a collection of tools for NGS data analysis, aimed at optimizing parts of the sequence analysis process to reduce the hardware requirements. The tools are developed with efficiency in mind, using multithreading and other techniques to speed up the analysis. The architecture has been verified by implementing a variant caller based on the Varscan 2 variant calling model, achieving a speedup of nearly 18 times. Additionally, the flexibility of the algorithm is also demonstrated by applying it to coverage analysis. Compared to BEDtools 2 the same analysis results were found but in only half the time by GNATY. The speed increase allows for a faster data analysis and more flexibility to analyse the same sample using multiple settings. The software is freely available for non-commercial usage at http://gnaty.phenosystems.com/.

Beat Wolf, Pierre Kuonen, Thomas Dandekar
De Novo Assembly and Cluster Analysis of Siberian Larch Transcriptome and Genome

We studied Siberian Larch (Larix Sibirica) transcriptome making de novo assembly and cluster analysis of contigs frequency dictionaries. Also, some preliminary results of similar study of the larch genome are present. It was found that the larch transcriptome yields a number of unexpected symmetries in the statistical and combinatorial properties of the entities.

Michael Sadovsky, Yulia Putintseva, Vladislav Birukov, Serafima Novikova, Konstantin Krutovsky

Assistive Technology for People with Neuromotor Disorders

Frontmatter
Sainet: An Image Processing App for Assistance of Visually Impaired People in Social Interaction Scenarios

This work describes a mobile application (Sainet) for image processing as an assistive technology devoted to visually impaired users. The app is targeted to the Android platform and usually executed in a mobile device equipped with a back camera for image acquisition. Moreover, a wireless bluetooth headphone provides the audio feedback to the user. Sainet has been conceived as an assistance tool to the user in a social interaction scenario. It is capable of providing audible information about the number and position (distance and orientation) of the interlocutors in the user frontal scenario. For validation purposes the app has been tested by a blind user who has provided valuable insights about its strengths and weaknesses.

Jesus Salido, Oscar Deniz, Gloria Bueno
The Effect of Transcranial Direct Current Stimulation (tDCS) Over Human Motor Function

Transcranial Direct Current Stimulation (tDCS) is a non-invasive, weak cortical neurostimulation technique which implements direct currents through two electrodes with opposite polarization when both are placed over a conductive surface (e.g. the scalp). It has demonstrated positive effects in a wide range of psychopathologies and neurological disorders in the last 15 years, being its neurophysiological modulatory effect on neuro-motor impairments one of the most important targets in tDCS researching. Thus, different motor-related pathologies have been improved by tDCS, such motor alterations after stroke, Parkinson’s disease, cerebral palsy in childhood, multiple sclerosis, etc. The positive effects of tDCS on motor abilities, both pathological condition or in healthy population, define it as an interesting option to induce neurophysiological changes complementing the traditional rehabilitation procedures. The comprehension of its neurophysiological and biochemical effects, the development of more ideographic procedures, and its integration with pharmacological treatments are mandatory in order to further improve its usage in rehabilitation approaches.

Cristian Pérez-Fernández, Ana Sánchez-Kuhn, Rosa Cánovas, Pilar Flores, Fernando Sánchez-Santed
Evaluation of Cervical Posture Improvement of Children with Cerebral Palsy After Physical Therapy with a HCI Based on Head Movements and Serious Videogames

This paper presents the preliminary results of a novel rehabilitation therapy for cervical and trunk control of children with cerebral palsy (CP). The therapy is based on the use of an inertial sensor that will be used to control a set of serious videogames with movements of the head. Ten users with CP participated in the study, in the experimental and control groups. Ten sessions of therapy provided improvements in head and trunk control that were higher in the experimental group for Visual Analogue Scale (VAS), Goal Attainment Scaling (GAS) and Trunk Control Measurement Scale (TCMS). Significant differences (27 % vs. 2 % of percentage improvement) were found between the experimental and control groups for TCMS (p < 0.05). The kinematic assessment shows that there are some improvements in active and passive range of motion, but no significant differences were found pre- and after-therapy. This new strategy, together with traditional rehabilitation therapies, could allow the child to reach maximum levels of function in the trunk and cervical regions.

Miguel A. Velasco, Rafael Raya, Luca Muzzioli, Daniela Morelli, Marco Iosa, Febo Cincotti, Eduardo Rocon
Enriched Environment Affects Positively a Progression of Neurodegeneration: Elastic Maps-Based Analysis

We studied the model to figure out the factors that may affect and retard the development of Alzheimer’s disease. The experimental rats have been kept in two kinds of environment: standard one vs. enriched one, and amiloid protein has been injected to both groups of rats to simulate Alzheimer’s disease. It is found the enriched environment is the key factor to retard the development of neurodegenerative disorder.

Michael Sadovsky, Andrey Morgun, Alla Salmina, Natalia Kuvacheva, Elena Khilazheva, Elena Pozhilenkova

Fundamentals of Biological Dynamics and Maximization of the Information Extraction from the Experiments in the Biological Systems

Frontmatter
Clustering of Multi-image Sets Using Rényi Information Entropy

We propose a clustering method based on the calculation of variables derived from the $$\alpha $$-dependent Rényi information entropy – a point information gain entropy ($$H_\alpha $$) and point information gain entropy density ($$\varXi _\alpha $$), which measure an information-entropic distance between two multidimensional distributions. The matrices of $$H_\alpha $$/$$\varXi _\alpha $$ values as functions of the parameter $$\alpha $$ and a label of a multidimensional set’s object are classified into groups using a standard k-means algorithm. The method is presented on two multi-image series which in the origin, the number of images in the sets, the number of image color channels, and the pixel resolution differ.

Renata Rychtáriková
Least Information Loss (LIL) Conversion of Digital Images and Lessons Learned for Scientific Image Inspection

Nowadays, most digital images are captured and stored at 16 or 12 bit per pixel integers, however, most personal computers can only display images in 8 bit per pixel integers. Besides, each microarray experiment produces hundreds of images which need larger storage space if images are stored in 16 or 12 bit. This is in most cases done by conversion of single images by an algorithm, which is not apparent to the user. A simple method to avoid the problem is converting 16 or 12-bit images to 8 bit by direct division of the 12-bit intervals into 256 sections and counting the number of points in each of them. Although this approach preserves the proportion of camera signals, it leads to severe loss of information due to losses in intensity depth resolution. The main aim of this article is introducing least information loss (LIL) algorithm as a novel approach to minimize the information loss caused by the transformation the primary camera signals (16 or 12 bit per pixels) to 8 bit per pixel. Least information loss algorithm is based on the omission of unoccupied intensities and transforming remaining points to 8 bit. This approach not only preserve information by storing intervals in the image EXIF file for further analysis, but also it improves object contrast for better visual inspection and object oriented classification. LIL algorithm may be applied also in image series where it enables comparison of primary camera data at scales identical over the whole series. This is particularly important in cases that the coloration is only apparent and reflect various physical processes such as in microscopy imaging.

Dalibor Štys, Tomáš Náhlík, Petr Macháček, Renata Rychtáriková, Mohammadmehdi Saberioon
Visual Exploration of Principles of Formation of Microscopic Image Intensities Using Image Explorer Software

The article demonstrates the most frequent mistakes made upon the transformation of digital images in biology. An image is formed by a few physical processes which contribute to each color channel in a different way. In the case of microscopic image in transmitting light, these processes are mainly light diffraction and absorption and autofluorescence of objects, which are all followed by distortion of wavefronts by the microscope optics. The final image is then a result of these processes in the plane of the camera chip. The article further reports methods to avoid (i) misconceptions due to apparent coloration after transformation of the original signal on the camera chip into color image and (ii) loss of the resolution after reduction of intensity depth from 12-bit to 8-bit.

Petr Císař, Tomáš Náhlík, Renata Rychtáriková, Petr Macháček
On Optimization of FRAP Experiments: Model-Based Sensitivity Analysis Approach

The accuracy in the determination of model parameters from data depends on the experimental setup. The advance in this area is often hindered by lack of communication between experimentalists and mathematical modelers. We aim to point out a potential benefit in parameter inference when the design variables are chosen optimally. Our approach, although case independent, is illustrated on FRAP (Fluorescence Recovery After Photobleaching) experimental technique. The core idea is closely related to the sensitivity analysis, namely to the maximization of a sensitivity measure depending on experimental settings. The proposed modification of the FRAP experimental protocol is simple and the enhancement of the whole parameter estimation process is significant.

Štěpán Papáček, Stefan Kindermann
A Semantic-Based Metadata Validation for an Automated High-Throughput Screening Workflow: Case Study in CytomicsDB

High-Throughput Screening (HTS) techniques are typically used to identify potential drug candidates. These type of experiments require invest in large amount of resources. The appropriate data management of HTS experiments has become a key challenge in order to succeed in the target validation. Current developments in imaging systems has to cope with computational requirements due to the significant increment of volumes of data. However, no special care has been taken to ensure the consistency, integrity and reliability of the data managed in HTS experiments. The appropriate validation of the data used in an HTS experiment has turned to be a key success factor in the target validation, thus a mandatory process to be included in the HTS workflow.This paper describes our research in the validation process as performed in CytomicsDB. This system is a modern RDBMS-based platform, designed to provide an architecture capable of dealing with the strict validation requirements during each stage of the HTS workflow. Furthermore, CytomicsDB has a flexible architecture which support easy access to external repositories in order to validate experiments data.

Enrique Larios Vargas, Zhihan Xia, Joris Slob, Fons J. Verbeek
Reachability of the Therapeutic Target in the Systems with Parameters Switch

Human organism is a complex system whose functioning is still under investigation. The biological models of intercellular interactions are created for better understanding of the complex system behaviour and prediction of the system response to the given stimuli. Medical system such as drug application to organism can be described as a piece-wise non-linear model. In our work we consider the influence of the parameter deviations of the systems with fixed terminal state. There are investigated two types of deviations: small changes of system parameters and changes in particular parameter switching time. We considered three different types of the systems, without self-regulation, with negative feedback loop and with positive feedback loop. We considered differences between these types and influence of the small changes in parameters values to the reachability of therapeutic goal after the drug application.

Magdalena Ochab, Krzysztof Puszynski, Andrzej Swierniak

High Performance Computing in Bioinformatics, Computational Biology and Computational Chemistry

Frontmatter
The Case for Docker in Multicloud Enabled Bioinformatics Applications

The introduction of next generation sequencing technologies did not bring only huge amounts of biological data but also highly sophisticated and versatile analysis workflows and systems. These new challenges require reliable and fast deployment methods over high performance servers in the local infrastructure or in the cloud. The use of virtualization technology has provided an efficient solution to overcome the complexity of deployment procedures and to provide a safe personalized execution box. However, the performance of applications running in virtual machines is worse than that of those running on the native infrastructure. Docker is a light weight alternative to the usual virtualization technology achieving notable better performance. In this paper, we explore the use case scenarios for using Docker to deploy and execute sophisticated bioinformatics tools and workflows, with a focus on the sequence analysis domain. We also introduce an efficient implementation of the package elasticHPC-Docker to enable creation of a docker-based computer cluster in the private cloud and in commercial clouds like Amazon and Google. We demonstrate by experiments that the use of elasticHPC-Docker is efficient and reliable in both private and commercial clouds.

Ahmed Abdullah Ali, Mohamed El-Kalioby, Mohamed Abouelhoda
Real Time GPU-Based Segmentation and Tracking of the Left Ventricle on 2D Echocardiography

Left ventricle segmentation and tracking in ultrasound images present necessary tasks for cardiac diagnostic. These tasks are difficult due to the inherent problems of ultrasound images (i.e. low contrast, speckle noise, signal dropout, presence of shadows, etc.). In this paper, we propose an accurate and automatic method for left ventricle segmentation and tracking. The method is based on optical flow estimation for detecting the left ventricle center. Then, the contour is defined and tracked using convex hull and spline interpolation algorithms. In order to provide a real time processing of videos, we propose also an effective and adapted exploitation of new parallel and heterogeneous architectures, that consist of both central (CPU) and graphic (GPU) processing units. The latter can exploit both NVIDIA and ATI graphic cards since we propose CUDA and OpenCL implementations. This allowed to improve the performance of our method thanks to the parallel exploitation of the high number of computing units within GPU. Our experiments are conducted using a set of 11 normal and 17 disease hearts ultrasound video sequences. The related results achieved automatic and real-time left ventricle detection and tracking with a rate of 92 % of success.

Sidi Ahmed Mahmoudi, Mohammed Ammar, Guillaume Luque Joris, Amine Abbou
Parallel Ant Colony Optimization for the HP Protein Folding Problem

Ant Colony Optimisation (ACO) is a bio-inspired population-based metaheuristic which emulates the ant colony’s behavior to solve problems computationally. Indeed, it is a swarm-based algorithm as it needs the interactions among all ants to provide good solutions to a particular problem. This collective computation is theoretically well-suited for parallelisation as several ants run in parallel looking for solutions, sharing their findings among them. In this paper, we design an ACO metaheuristic to solve the Protein Folding Problem using a simplified model (HP) that identifies amino acids like Hydrophobic (H) or Polar (P), attending to the attraction or the rejection that the amino acid present against water. We also propose a parallel ACO version applied to the HP model on Graphics Processing Units (GPUs) using Compute Unified Device Architecture (CUDA). Our results reveal up to 7$$\times $$ speed-up factor compared to a sequential counterpart version. Results and conclusions about this parallel version suggests a broader area of inquiry, where researchers within the fields of Bioinformatics may learn to adapt similar problems to the tupla of an optimization method and GPU architecture.

Antonio Llanes, Carlos Vélez, Antonia M. Sánchez, Horacio Pérez-Sánchez, José M. Cecilia
Neuroimaging Registration on GPU: Energy-Aware Acceleration

We present a CUDA implementation for Kepler and Maxwell GPU generations of neuroimaging registration based on the NiftyReg open-source library [1]. A wide number of strategies are deployed to accelerate the code, providing insightful guidelines to exploit the massive parallelism and memory hierarchy within emerging GPUs. Our efforts are analyzed from different perspectives: Acceleration, numerical accuracy, power consumption and energy efficiency, to identify potential scenarios where performance per watt can be optimal in large-scale biomedical applications. Experimental results suggest that parallelism and arithmetic intensity represent the most rewarding ways on the road to high performance bioinformatics when power is a major concern.

Francisco Nurudín Álvarez, José Antonio Cabrera, Juan Francisco Chico, Jesús Pérez, Manuel Ujaldón
Unleashing the Graphic Processing Units-Based Version of NAMD

NAMD is a parallel molecular dynamics software designed for high-performance simulations of large biomolecular systems. It scales from single computer up to hundreds of processors as high-end parallel platforms. Additionally, considering the evolution of Graphics Processing Units (GPUs) as a general purpose massively parallel co-processors, NAMD has included this kind of devices to leverage its computational power. In this work we analyze current NAMD GPU solution and develop an alternative based on Newton’s third law. The results shows a significant reduction of the execution time of GPU computations, of up to 20 % when compared with a highly tuned version of the original GPU-enabled NAMD.

Yamandú González, Pablo Ezzatti, Margot Paulino

Human Behavior Monitoring, Analysis and Understanding

Frontmatter
Consistency Verification of Marker-Less Gait Assessment System for Stair Walking

The number of elderly people is drastically increasing. To support them, the gait information is under the spotlight since it has the relationship between the fall risk and dementia. Among other scenarios, a relatively higher level of ability is needed for the stair walking as it requires balancing and loading. Conventionally, 3D motion capture devices have been used to acquire the parameters of stair walking. However, it is difficult to acquire daily parameters as the equipment needs complicated preparation and body-worn markers. In this study, we propose a system which can acquire daily stair walking parameters using only depth data obtained by Kinect v2 without restraining by markers. We confirmed the accuracy of our proposed system compared with a 3D motion capture system.

Ami Ogawa, Ayanori Yorozu, Akira Mita, Masaki Takahashi, Christos Georgoulas, Thomas Bock
Full Body Gesture Recognition for Human-Machine Interaction in Intelligent Spaces

This paper describes a proposal for a full body gesture recognition system to be used in an intelligent space to allow users to control their environment. We describe a successful adaptation of the traditional strategy applied in the design of spoken language recognition systems, to the new domain of full body gesture recognition. The experimental evaluation has been done on a realistic task where different elements in the environment can be controlled by the users using gesture sequences. The evaluation results have been obtained applying a rigorous experimental procedure, evaluating different feature extraction strategies. The average recognition rates achieved are around 97 % for the gestural sentence level, and over 98 % at the gesture level, thus experimentally validating the proposal.

David Casillas-Perez, Javier Macias-Guarasa, Marta Marron-Romera, David Fuentes-Jimenez, Alvaro Fernandez-Rincon
A Web System for Managing and Monitoring Smart Environments

Smart environments have the ability to record information about the behavior of the people by means of their interactions with the objects within an environment. This kind of environments are providing solutions to address some of the problems associated with the growing size and ageing of the population by means of the recognition of activities, monitoring activities of daily living and adapting the environment. In this contribution, a Web system for managing and monitoring smart environments is introduced as an useful tool to activity recognition. The Web system has the advantages to process the information, accessible services and analytic capabilities. Furthermore, a case study monitored by the proposed Web System is illustrated in order to show its performance, usefulness and effectiveness.

Daniel Zafra, Javier Medina, Luis Martinez, Chris Nugent, Macarena Espinilla
A Study in Experimental Methods of Human-Computer Communication for Patients After Severe Brain Injuries

Experimental research in the domain of multimedia technology applied to medical practice is discussed, employing a prototype of integrated multimodal system to assist diagnosis and polysensory stimulation of patients after severe brain injury. The system being developed includes among others: eye gaze tracker, and EEG monitoring of non-communicating patients after severe brain injuries. The proposed solutions are used for collecting and analyzing patients’ responses and interactions induced by the multimodal stimulation, resulting in assessing the influence of stimuli on increase of patient’s cognitive and communicative functions with the use of intelligent data analysis methods.

Andrzej Czyzewski, Bozena Kostek

Pattern Recognition and Machine Learning in the -omics Sciences

Frontmatter
Random Forests for Quality Control in G-Protein Coupled Receptor Databases

G protein-coupled receptors are a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. In this study, we are interested not as much in achieving maximum sub-family discrimination accuracy, but in exploring sequence misclassification behaviour. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. This analysis should assist database curators in receptor quality control tasks. Random Forests are used for this analysis due their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification.

Aleksei Shkurin, Alfredo Vellido
Automated Quality Control for Proton Magnetic Resonance Spectroscopy Data Using Convex Non-negative Matrix Factorization

Proton Magnetic Resonance Spectroscopy (1H MRS) has proven its diagnostic potential in a variety of conditions. However, MRS is not yet widely used in clinical routine because of the lack of experts on its diagnostic interpretation. Although data-based decision support systems exist to aid diagnosis, they often take for granted that the data is of good quality, which is not always the case in a real application context. Systems based on models built with bad quality data are likely to underperform in their decision support tasks. In this study, we propose a system to filter out such bad quality data. It is based on convex Non-Negative Matrix Factorization models, used as a dimensionality reduction procedure, and on the use of several classifiers to discriminate between good and bad quality data.

Victor Mocioiu, Sreenath P. Kyathanahally, Carles Arús, Alfredo Vellido, Margarida Julià-Sapé
A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

The massive expansion of the worldwide Protein Data Bank (PDB) provides new opportunities for computational approaches which can learn from available data and extrapolate the knowledge into new coming instances. The aim of this work is to apply machine learning in order to train prediction models using data acquired by costly experimental procedures and perform enzyme functional classification. Enzymes constitute key pharmacological targets and the knowledge on the chemical reactions they catalyze is very important for the development of potent molecular agents that will either suppress or enhance the function of the given enzyme, thus modulating a pathogenicity, an illness or even the phenotype. Classification is performed on two levels: (i) using structural information into a Support Vector Machines (SVM) classifier and (ii) based on amino acid sequence alignment and Nearest Neighbor (NN) classification. The classification accuracy is increased by fusing the two classifiers and reaches 93.4 % on a large dataset of 39,251 proteins from the PDB database. The method is very competitive with respect to accuracy of classification into the 6 enzymatic classes, while at the same time its computational cost during prediction is very small.

Afshine Amidi, Shervine Amidi, Dimitrios Vlachakis, Nikos Paragios, Evangelia I. Zacharaki
Gene-Disease Prioritization Through Cost-Sensitive Graph-Based Methodologies

Finding genes associated with human genetic disorders is one of the most challenging problems in bio-medicine. In this context, to guide researchers in detecting the most reliable candidate causative-genes for the disease of interest, gene prioritization methods represent a necessary support to automatically rank genes according to their involvement in the disease under study. This problem is characterized by highly unbalanced classes (few causative and much more non-causative genes) and requires the adoption of cost-sensitive techniques to achieve reliable solutions. In this work we propose a network-based methodology for disease-gene prioritization designed to expressly cope with the data imbalance. Its validation over a benchmark composed of 708 selected medical subject headings (MeSH) diseases, shows that our approach is competitive with state-of-art methodologies, and its reduced time complexity makes its application feasible on large-size datasets.

Marco Frasca, Simone Bassis
Network Ranking Assisted Semantic Data Mining

Semantic data mining (SDM) uses annotated data and interconnected background knowledge to generate rules that are easily interpreted by the end user. However, the complexity of SDM algorithms is high, resulting in long running times even when applied to relatively small data sets. On the other hand, network analysis algorithms are among the most scalable data mining algorithms. This paper proposes an effective SDM approach that combines semantic data mining and network analysis. The proposed approach uses network analysis to extract the most relevant part of the interconnected background knowledge, and then applies a semantic data mining algorithm on the pruned background knowledge. The application on acute lymphoblastic leukemia data set demonstrates that the approach is well motivated, is more efficient and results in rules that are comparable or better than the rules obtained by applying the incorporated SDM algorithm without network reduction in data preprocessing.

Jan Kralj, Anže Vavpetič, Michel Dumontier, Nada Lavrač
A Comprehensive Comparison of Two MEDLINE Annotators for Disease and Gene Linkage: Sometimes Less is More

Text mining is popular in biomedical applications because it allows retrieving highly relevant information. Particularly for us, it is quite practical in linking diseases to the genes involved in them. However text mining involves multiple challenges, such as (1) recognizing named entities (e.g., diseases and genes) inside the text, (2) constructing specific vocabularies that efficiently represent the available text, and (3) applying the correct statistical criteria to link biomedical entities with each other. We have previously developed Beegle, a tool that allows prioritizing genes for any search query of interest. The method starts with a search phase, where relevant genes are identified via the literature. Once known genes are identified, a second phase allows prioritizing novel candidate genes through a data fusion strategy. Many aspects of our method could be potentially improved. Here we evaluate two MEDLINE annotators that recognize biomedical entities inside a given abstract using different dictionaries and annotation strategies. We compare the contribution of each of the two annotators in associating genes with diseases under different vocabulary settings. Somewhat surprisingly, with fewer recognized entities and a more compact vocabulary, we obtain better associations between genes and diseases. We also propose a novel but simple association criterion to link genes with diseases, which relies on recognizing only gene entities inside the biomedical text. These refinements significantly improve the performance of our method.

Sarah ElShal, Jaak Simm, Adam Arany, Pooya Zakeri, Jesse Davis, Yves Moreau

Resources for Bioinformatics

Frontmatter
A Mechanistic Study of lncRNA Fendrr Regulation of FoxF1 Lung Cancer Tumor Supressor

Long non-coding RNAs are known to play multiple roles in the complex machinery of the cell. However, their recent addition to genomic research has increased the complexity of gene expression analyses. In this work, we perform a computational study that aims to contribute to the current understanding of the mechanisms that underlie the experimentally suggested interaction between the lncRNA Fendrr and FoxF1 lung cancer tumor suppressor in carcinogenesis. Results suggest that there exists indeed a multi-level interaction between Fendrr and FoxF1 promoter region, both direct via RNA-DNA:DNA triplex domain formation or mediated by proteins that interact simultaneously with the promoter region of FoxF1 and Fendrr transcripts. Moreover, the applied computational methodology can serve as a pipeline to process any candidate lncRNA-gene pair of interest and obtain putative sources of lncRNA-gene interaction.

Carmen Navarro, Carlos Cano, Marta Cuadros, Antonio Herrera-Merchan, Miguel Molina, Armando Blanco
Evaluation of Disambiguation Strategies on Biomedical Text Categorization

A common and ordinary way of representing a text is as a Bag of its component Words BoW. This Representation suffers from the lack of sense in resulting representations ignoring all semantics that reside in the original text, instead of, the Conceptualization using background knowledge enriches document representation models. While searching polysemic term corresponding senses in semantic resources, multiple matches are detected then introduce some ambiguities in the final document representation, three strategies for Disambiguation can be used: First Concept, All Concepts and Context-Based. SenseRelate is a well-known Context-Based algorithm, which use a fixed window size and taking into consideration the distance weight on how far the terms in the context are from the target word. This may impact negatively on the yielded concepts or senses.To overcome this problem, and therefore to enhance the process of Biomedical WSD, in this paper we propose a simple modified versions of SenseRelate algorithm named NoDistanceSenseRelate which simply ignore the distance, that is the terms in the context will have the same distance weight.To illustrate the efficiency of both SenseRelate algorithm and NoDistanceSenseRelate one over the others methods, in this study, several experiments have been conducted using OHSUMED corpus. The obtained results using Biomedical Text Categorization system based on three machine learning models: Support Vector Machine (SVM), Naïve Bayes (NB) and Maximum Entropy (ME) show that the Context-Based methods (SenseRelate and NoDistanceSenseRelate) outperform the others ones.

Mohammed Rais, Abdelmonaime Lachkar
Biomolecular Annotation Integration and Querying to Help Unveiling New Biomedical Knowledge

Targeting biological questions requires comprehensive evaluation of multiple types of annotations describing current biological knowledge; they are increasingly available, but their fast evolution, heterogeneity and dispersion in many different sources hamper their effective use. Leveraging on innovative flexible data schema and automatic software procedures that support the integration of data sources evolving in number, data content and structure, while assuring quality and provenance tracking of the integrated data, we created a multi-organism Genomic and Proteomic Knowledge Base (GPKB) and easily maintained it updated. From several well-known databases it imports and integrates very numerous gene and protein data, external references and annotations, expressed through multiple biomedical terminologies. To easily query such integrated data, we developed intuitive web interfaces and services for programmatic access to the GPKB; they are publicly available respectively at http://www.bioinformatics.deib.polimi.it/GPKB/ and http://www.bioinformatics.deib.polimi.it/GPKB-REST/. The created GPKB is a very valuable resource used in several projects by many users; the developed interfaces enhance its relevance to the community by allowing the seamlessly composition of queries, although complex, on all data integrated in the GPKB, which can help unveiling new biomedical knowledge.

Arif Canakoglu, Stefano Ceri, Marco Masseroli
Backmatter
Metadaten
Titel
Bioinformatics and Biomedical Engineering
herausgegeben von
Francisco Ortuño
Ignacio Rojas
Copyright-Jahr
2016
Electronic ISBN
978-3-319-31744-1
Print ISBN
978-3-319-31743-4
DOI
https://doi.org/10.1007/978-3-319-31744-1