Skip to main content

2017 | Buch

Computational Intelligence Methods for Bioinformatics and Biostatistics

13th International Meeting, CIBB 2016, Stirling, UK, September 1-3, 2016, Revised Selected Papers

herausgegeben von: Andrea Bracciali, Giulio Caravagna, David Gilbert, Prof. Roberto Tagliaferri

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the 13th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2016, held in Stirling, UK, in September 2016. The 19 revised full papers and 6 keynotes abstracts presented were carefully reviewed and selected from 61 submissions. The papers deal with the application of computational intelligence to open problems in bioinformatics, biostatistics, systems and synthetic biology, medicalinformatics, computational approaches to life sciences in general

Inhaltsverzeichnis

Frontmatter
Module Detection Based on Significant Shortest Paths for the Characterization of Gene Expression Data
Abstract
The characterization of diseases in terms of perturbed gene modules was recently introduced for the analysis of gene expression data. Some approaches were proposed in literature, but most of them are inductive approaches. This means that they try to infer key gene networks directly from data, ignoring the biological information available. Here a unique method for the detection of perturbed gene modules, based on the combination of data and hypothesis-driven approaches, is described. It relies upon biological metabolic pathways and significant shortest paths evaluated by structural equation modeling (SEM). The procedure was tested on a microarray experiment concerning tuberculosis (TB) disease. The validation of the final disease module was principally done by the Wang similarity semantic index and the Disease Ontology enrichment analysis. Finally, a topological analysis of the module via centrality measures and the identification of the cut vertices allowed to unveil important nodes in the disease module network. The results obtained were promising, as shown by the detection of key genes for the characterization of the studied disease.
Daniele Pepe
Information-Theoretic Active Contour Model for Microscopy Image Segmentation Using Texture
Abstract
High throughput technologies have increased the need for automated image analysis in a wide variety of microscopy techniques. Geometric active contour models provide a solution to automated image segmentation by incorporating statistical information in the detection of object boundaries. A statistical active contour may be defined by taking into account the optimisation of an information-theoretic measure between object and background. We focus on a product-type measure of divergence known as Cauchy-Schwartz distance which has numerical advantages over ratio-type measures. By using accurate shape derivation techniques, we define a new geometric active contour model for image segmentation combining Cauchy-Schwartz distance and Gabor energy texture filters. We demonstrate the versatility of this approach on images from the Brodatz dataset and phase-contrast microscopy images of cells.
Veronica Biga, Daniel Coca
Host Phenotype Prediction from Differentially Abundant Microbes Using RoDEO
Abstract
Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing is increasingly used in human and animal health, food safety, and environmental studies. In these high-dimensional (metagenomic) data, the phenotype of the host organism, e.g., human, may not be obvious to detect and then the ability to predict it becomes a powerful analytic tool. For example, consider predicting the disease status of an individual from their gut microbiome.
In this study, we compare various normalization methods for metagenomic count data and their impact on phenotype prediction. The methods include RoDEO, Robust Differential Expression Operator, originally developed for gene expression studies. The best prediction accuracy is observed for RoDEO-processed count data with linear kernel support vector machines in most cases, for a variety of real datasets including human, mouse, and environmental samples.
We also address the problem of identifying the most relevant microbial features that could give insight into the structure and function of the differential communities observed between phenotypes. Interestingly, we obtain similar or better phenotype prediction accuracy with a small subset of features as with the complete set of sequenced features.
Anna Paola Carrieri, Niina Haiminen, Laxmi Parida
DeepScope: Nonintrusive Whole Slide Saliency Annotation and Prediction from Pathologists at the Microscope
Abstract
Modern digital pathology departments have grown to produce whole-slide image data at petabyte scale, an unprecedented treasure chest for medical machine learning tasks. Unfortunately, most digital slides are not annotated at the image level, hindering large-scale application of supervised learning. Manual labeling is prohibitive, requiring pathologists with decades of training and outstanding clinical service responsibilities. This problem is further aggravated by the United States Food and Drug Administration’s ruling that primary diagnosis must come from a glass slide rather than a digital image. We present the first end-to-end framework to overcome this problem, gathering annotations in a nonintrusive manner during a pathologist’s routine clinical work: (i) microscope-specific 3D-printed commodity camera mounts are used to video record the glass-slide-based clinical diagnosis process; (ii) after routine scanning of the whole slide, the video frames are registered to the digital slide; (iii) motion and observation time are estimated to generate a spatial and temporal saliency map of the whole slide. Demonstrating the utility of these annotations, we train a convolutional neural network that detects diagnosis-relevant salient regions, then report accuracy of 85.15% in bladder and 91.40% in prostate, with 75.00% accuracy when training on prostate but predicting in bladder, despite different pathologists examining the different tissues. When training on one patient but testing on another, AUROC in bladder is 0.79 ± 0.11 and in prostate is 0.96 ± 0.04. Our tool is available at https://​bitbucket.​org/​aschaumberg/​deepscope.
Andrew J. Schaumberg, S. Joseph Sirintrapun, Hikmat A. Al-Ahmadie, Peter J. Schüffler, Thomas J. Fuchs
PLS-SEM Mediation Analysis of Gene-Expression Data for the Evaluation of a Drug Effect
Abstract
Gene expression analysis can unveil the genes associated with the molecular action of a drug. However, it is not always clear how the differentially expressed genes restore the phenotype and whether, globally, the drug has an effect on the disease. We propose a method that exploits gene-expression data and network biology information to build a mediation analysis model for the evaluation of the effect of treatment on the disease at molecular level. First, differentially expressed genes (DEGs) associated to the drug and the disease are discovered. Then, based on a pathway analysis, shortest paths between drug DEGs and disease DEGs are obtained. This allows discovering the mediator genes that connect drug genes to disease genes. The expression values of the three sets of genes are used to conduct a mediation analysis that evaluates the effect of the drug on the disease. The effect could be direct, indirect by mediators, or both. The latent variables and mediation model are constructed by using the PLS-SEM. The procedure is applied to a real example concerning the effect of abacavir on HIV samples. The proposed pipeline can offer an additional tool for the understanding of the etiology of a disease and unveiling the mechanisms of action of a drug at gene level.
Daniele Pepe, Tomasz Burzykowski
A Novel Algorithm for CpG Island Detection in Human Genome Based on Clustering and Chaotic Particle Swarm Optimization
Abstract
CpG islands provide a major role in the genome and are used for prediction of promoter regions. They are abnormally methylated in cancer cells and can be used as tumor markers. However, current techniques for identifying CpG islands suffer from various drawbacks. In this paper, we propose a novel algorithm to detect CpG islands by combining clustering techniques with complementary chaotic particle swarm optimization. Clustering techniques are used to find the locations of potential CpG island candidates in the genome while Complementary Chaotic PSO is used to find the best location of a CpG island in a cluster candidate without being trapped in local optimum solution. This combination can successfully overcome the drawbacks of each method while maintaining their advantages. The proposed method called 3C-PSO provides a high sensitivity detection of CpG islands in the human genome. To evaluate its performance, we used six sequences from NCBI, and five measures of performance: sensitivity (SN), specificity (SP), accuracy (ACC), performance coefficient (PC), and correlation coefficient (CC). We compared our approach to the existing methods of CpG islands detection in the human genome. The obtained results have shown that 3C-PSO competes with and even outperforms these methods.
Abdelbasset Boukelia, Zakaria Benmounah, Mohamed Batouche, Bouchera Maati, Ikram Nekkache
COSYS: A Computational Infrastructure for Systems Biology
Abstract
Computational models are essential in order to integrate and extract knowledge from the large amount of -omics data that are increasingly being collected thanks to high-throughput technologies. Unfortunately, the definition of an appropriate mathematical model is typically inaccessible to scientists with a poor computational background, whereas expert users often lack the proficiency required for biologically grounded models. Although many efforts have been put in software packages intended to bridge the gap between the two communities, once a model is defined, the problem of simulating and analyzing it within a reasonable time still persists. We here present COSYS, a web-based infrastructure for Systems Biology that guides the user through the definition, simulation and analysis of reaction-based models, including the deterministic and stochastic description of the temporal dynamics, and the Flux Balance Analysis. In the case of computationally demanding analyses, COSYS can exploit GPU-accelerated algorithms to speed up the computation, thereby making critical tasks, as for instance an exhaustive scan of parameter values, attainable to a large audience.
Fabio Cumbo, Marco S. Nobile, Chiara Damiani, Riccardo Colombo, Giancarlo Mauri, Paolo Cazzaniga
Statistical Texture-Based Mapping of Cell Differentiation Under Microfluidic Flow
Abstract
Timelapse microscopy enables long term monitoring of biological processes, however a major bottleneck in assesing experimental outcome is the need for an automated analysis framework to extract statistics and evaluate results. In this study, we use Gabor energy texture descriptors to generate a high dimensional feature space which is analysed with principal component analysis to provide unsupervised characterisation of texture differences between pairs of images. We apply this technique to differentiation of human embryonic carcinoma cells in the presence of all-trans retinoic acid (RA) and show that differentiation outcome can be predicted directly from texture information. A microfluidic environment is used to deliver pulses of RA stimulation over five days in culture. Results provide insight into the dynamics of cell response to differentiation signals over time.
Veronica Biga, Olívia M. Alves Coelho, Paul J. Gokhale, James E. Mason, Eduardo M. A. M. Mendes, Peter W. Andrews, Daniel Coca
Constraining Mechanism Based Simulations to Identify Ensembles of Parametrizations to Characterize Metabolic Features
Abstract
Constraint-based approaches have been proven useful to determine steady state fluxes in metabolic models, however they are not able to determine metabolite concentrations and they imply the assumption that a biological process is optimized towards a given function. In this work we define a computational strategy exploiting mechanism based simulations as a framework to determine, through a filtering procedure, ensembles of kinetic constants and steady state metabolic concentrations that are in agreement with one or more metabolic phenotypes, avoiding at the same time the need of assuming an optimization mechanism. To test our procedure we exploited a model of yeast metabolism and we filtered trajectories accordingly to a loose definition of the Crabtree phenotype.
Riccardo Colombo, Chiara Damiani, Giancarlo Mauri, Dario Pescini
Process Algebra with Layers: Multi-scale Integration Modelling Applied to Cancer Therapy
Abstract
We present a novel Process Algebra designed for multi-scale integration modelling: Process Algebra with Layers (PAL). The unique feature of PAL is the modularisation of scale into integrated layers: Object and Population. An Object can represent a molecule, organelle, cell, tissue, organ or any organism. Populations hold specific types of Object, for example, life stages, cell phases and infectious states. The syntax and semantics of this novel language are presented. A PAL model of the multi-scale system of cell growth and damage from cancer treatment is given. This model allows the analysis of different scales of the system. The Object and Population levels give insight into the length of a cell cycle and cell population growth respectively. The PAL model results are compared to wet laboratory survival fractions of cells given different doses of radiation treatment [1]. This comparison shows how PAL can be used to aid in investigations of cancer treatment in systems biology.
Erin Scott, James Nicol, Jonathan Coulter, Andrew Hoyle, Carron Shankland
A Problem-Driven Approach for Building a Bioinformatics GraphDB
Abstract
The development of high throughput technology in biological and medical domains has seen a growing intervention of informatics support. Indeed, the big amount of data produced is difficult to analyse and interpret in terms of time consuming and number of different resources used. In this context, the challenge would be to have an integrated and multi component database with a user friendly interface able to solve biological problems without a priori high-level of bioinformatics knowledge. This need arises from the evidence that biologists have multi-task and multi-levels problems to solve. To this aim, we propose a bottom-up, graph-based approach for integrating bioinformatics resources, usually databases, starting from typical biological scenarios, in order to solve novel bioinformatics problems. The integrated resources can be queried by means of a graph traversal language such as Gremlin.
Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Antonio Messina, Riccardo Rizzo, Alfonso Urso
Parameter Inference in Differential Equation Models of Biopathways Using Time Warped Gradient Matching
Abstract
Parameter inference in mechanistic models of biopathways based on systems of coupled differential equations is a topical yet computationally challenging problem due to the fact that each parameter adaptation involves a numerical integration of the differential equations. Techniques based on gradient matching, which aim to minimize the discrepancy between the slope of a data interpolant and the derivatives predicted from the differential equations, offer a computationally appealing shortcut to the inference problem. Gradient matching critically hinges on the smoothing scheme for function interpolation, with spurious oscillations in the interpolant having a dramatic effect on the subsequent inference. The present article demonstrates that a time warping approach that aims to homogenize intrinsic functional length scales can lead to a significant improvement in parameter estimation accuracy. We demonstrate the effectiveness of this scheme on noisy data from a dynamical system with periodic limit cycle, and a biopathway model.
Mu Niu, Simon Rogers, Maurizio Filippone, Dirk Husmeier
IRIS-TCGA: An Information Retrieval and Integration System for Genomic Data of Cancer
Abstract
Data integration is one of the most challenging research topic in many knowledge domains, and biology is surely one of them. However theory and state of the art methods make this task complex for most of the small research centers. Fortunately, several organizations are focusing on collecting heterogeneous data making an easier task to design analysis tools and test biological and medical hypothesis on integrated data. One of the most evident case of such efforts is The Cancer Genome Atlas (TCGA), a data base that contains a large variety of information related to different types of cancer. This data base offers a great opportunity to those interested in performing analysis of integrated data; however, its exploitation is not so easy since non trivial efforts are required to extract and combine data before it could be analyzed in an integrated perspective. In this paper we present IRIS-TCGA, an online web service developed to perform multiple queries for data integration on TCGA. Differently from other tools that have been proposed to interact with TCGA, IRIS-TCGA allows a direct access to the data and enables to extract detailed combinations of subsets of the repository, according to filters and high-order queries. The structure of the system is simple, as it is built on two main operators, union and intersection, that are then used to construct queries of higher complexity. The first version of the system supports the extraction and integration of gene expression (RNA-sequencing, microarrays), DNA-methylation, and DNA-sequencing (mutations) data from experiments on tissues of patients, together with their related meta data, in a gene oriented organization. The extracted data matrices are particularly suited for data mining applications (e.g., classification). Finally, we show two application examples, where IRIS-TCGA is used for integrating genomic data from RNA-sequencing and DNA-methylation experiments, and where state-of-the-art bioinformatics analysis tools are applied to the integrated data in order to extract new knowledge from them. IRIS-TCGA is freely available at http://​bioinf.​iasi.​cnr.​it/​iristcga/​.
Fabio Cumbo, Emanuel Weitschek, Paola Bertolazzi, Giovanni Felici
Effect of UV Radiation on DPPG and DMPC Liposomes in Presence of Catechin Molecules
Abstract
Catechin molecules are known to reduce the oxidative stress-induced by radiation acting as scavenger of the reactive oxygen species, preventing in this way the damage in biomolecules. In this work, the effect of radiation on liposomes of 1,2-dipalmitoyl-sn-glycero-3-[phospho-rac-(1-glycerol)(sodium salt) (DPPG) and of 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) is analyzed in the absence and presence of epigallocatechin-3-gallate (EGCG) molecules, having in view the evaluation of the photosensitizing properties and the efficacy of these molecules to modulate cell membrane damage mechanisms. The obtained results demonstrate that the damage by UV radiation on DPPG and DMPC liposomes is strongly dependent of the presence of EGCG molecules. While DPPG liposomes are protected from radiation in presence of EGCG, the EGCG molecules are damaged by the radiation supporting the idea that EGCG are strongly adsorbed on the inner and outer liposome surfaces due hydrogen bonding. This suggests that EGCG molecules in the inner surface can be protected from radiation. In the case of DMPC liposomes, the EGCG molecules are affected by radiation as well as the DMPC molecules. This is explained if the EGCG chroman group is positioned between DMPC lipids while the gallic acid groups float over the liposomes.
Filipa Pires, Gonçalo Magalhães-Mota, Paulo António Ribeiro, Maria Raposo
Inference in a Partial Differential Equations Model of Pulmonary Arterial and Venous Blood Circulation Using Statistical Emulation
Abstract
The present article addresses the problem of inference in a multiscale computational model of pulmonary arterial and venous blood circulation. The model is a computationally expensive simulator which, given specific parameter values, solves a system of nonlinear partial differential equations and returns predicted pressure and flow values at different locations in the arterial and venous blood vessels. The standard approach in parameter calibration for computer code is to emulate the simulator using a Gaussian Process prior. In the present work, we take a different approach and emulate the objective function itself, i.e. the residual sum of squares between the simulations and the observed data. The Efficient Global Optimization (EGO) algorithm [2] is used to minimize the residual sum of squares. A generalization of the EGO algorithm that can handle hidden constraints is described. We demonstrate that this modified emulator achieves a reduction in the computational costs of inference by two orders of magnitude.
Umberto Noè, Weiwei Chen, Maurizio Filippone, Nicholas Hill, Dirk Husmeier
Ensemble Approaches for Stable Assessment of Clusters in Microbiome Samples
Abstract
Fundamental endeavour to understand microbiome and its functions starts with detecting which microbes are present in the samples and continues with comparing different samples and finding similar based on their community compositions. Pervasive method to accomplish these steps is clustering. However clustering brings number of possibilities regarding algorithms, parameters, distance/similarity metrics, etc., that produce different outcomes making it hard to interpret results. The study presented here examines the stability of clusters in the context of various beta diversity metrics applied on human microbiome samples. We explored the effects of 24 different diversity metrics on clustering outcomes and their impact on the accuracy of the clustering of microbiome samples. To overcome obscure results coming from individual clusterings that rely on distinct beta diversity metrics we employed two ensemble approaches to integrate results of individual clusterings. Obtained results on human microbiome data imply that ensemble clustering approaches produce stable results in reconstructing clusters that correspond to the different host and body habitat.
Sanja Brdar, Vladimir Crnojević
Multilayer Data and Document Stratification for Comorbidity Analysis
Abstract
In this work, we introduce two novel contributions to the study of comorbidity. The first is a new method for finding disease correlations, using a multitude of information sources. In the era of big data, methods such as evidence synthesis enable researchers to exploit many freely available information sources to enrich their analyses. This forms the basis for our method where in lieu of examining one form of evidence, we introduce a novel combination of sources, providing an indirect association between patient genetic data and the scientific literature. Our second contribution is a new method for stratifying the scientific literature when searching for newly discovered disease correlations. Given that the volume of published biomedical literature has increased dramatically, a clinician does not have the ability to read every relevant article. We therefore propose a new way for refining the literature search space to discover recently introduced disease correlations. Results show that our system can produce reasonable hypotheses for disease correlations, and that document stratification is an important aspect to take into account when using scientific literature.
Kevin Heffernan, Pietro Liò, Simone Teufel
Evolving Dendritic Morphologies Highlight the Impact of Structured Synaptic Inputs on Neuronal Performance
Abstract
Dendrites, the most conspicuous elements of neurons, extensively determine a cell’s capacity to recognise synaptic inputs. Investigating its structure and morphological properties unravels the functioning mechanism of neurons that cooperates the process of learning and memory. This research systematically generates a varying topology of dendrites in a multi-compartmental model of a neuron with passive properties and it further explores a cell’s integration ability of complex synaptic potentials. The neurons receive an equal number of binary input patterns of synaptic activity and the performance of a cell is gauged by calculating the signal to noise ratio between amplitudes of somatic voltage. The objective is to analyse the types of input pattern in combination with morphological properties that may strengthen or weaken the somatic response. Finally, an evolutionary algorithm produces a fine variety of branching structures calculating the weighted sum of synaptic inputs, further identifying the impact of membrane and morphological properties on neuronal performance.
Mohammad Ziyad Kagdi
Semantic Clustering for Identifying Overlapping Biological Communities
Abstract
Proteins encoded by the genes associated with a common disorder interact together, participate in similar pathways, and share Gene Ontology (GO) terms. Drug discovery for certain disease may arise from a hypothesis that genes contributing to a common disorder have an increased tendency for their products to be linked at various functional levels. This may be induced from experimental studies of protein-protein interactions, co-regulation, co-expression, and annotated semantic information (e.g., those stored in Gene Ontology). Our aim is to improve the quality of aggregation discovery in dense biological interactions by incorporating such information embedded in biological repositories and mapping them in the spectral embedding space.
Hassan Mahmoud, Francesco Masulli, Stefano Rovetta
Backmatter
Metadaten
Titel
Computational Intelligence Methods for Bioinformatics and Biostatistics
herausgegeben von
Andrea Bracciali
Giulio Caravagna
David Gilbert
Prof. Roberto Tagliaferri
Copyright-Jahr
2017
Electronic ISBN
978-3-319-67834-4
Print ISBN
978-3-319-67833-7
DOI
https://doi.org/10.1007/978-3-319-67834-4