Top

2013 | Book

Read chapter Read first chapter

7th International Conference on Practical Applications of Computational Biology & Bioinformatics

Editors: Mohd Saberi Mohamad, Loris Nanni, Miguel P. Rocha, Florentino Fdez-Riverola

Publisher: Springer International Publishing

Book Series : Advances in Intelligent Systems and Computing

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

The growth in the Bioinformatics and Computational Biology fields over the last few years has been remarkable and the trend is to increase its pace. In fact, the need for computational techniques that can efficiently handle the huge amounts of data produced by the new experimental techniques in Biology is still increasing driven by new advances in Next Generation Sequencing, several types of the so called omics data and image acquisition, just to name a few. The analysis of the datasets that produces and its integration call for new algorithms and approaches from fields such as Databases, Statistics, Data Mining, Machine Learning, Optimization, Computer Science and Artificial Intelligence. Within this scenario of increasing data availability, Systems Biology has also been emerging as an alternative to the reductionist view that dominated biological research in the last decades. Indeed, Biology is more and more a science of information requiring tools from the computational sciences. In the last few years, we have seen the surge of a new generation of interdisciplinary scientists that have a strong background in the biological and computational sciences. In this context, the interaction of researchers from different scientific fields is, more than ever, of foremost importance boosting the research efforts in the field and contributing to the education of a new generation of Bioinformatics scientists. PACBB‘13 hopes to contribute to this effort promoting this fruitful interaction. PACBB'13 technical program included 19 papers from a submission pool of 32 papers spanning many different sub-fields in Bioinformatics and Computational Biology. Therefore, the conference will certainly have promoted the interaction of scientists from diverse research groups and with a distinct background (computer scientists, mathematicians, biologists). The scientific content will certainly be challenging and will promote the improvement of the work that is being developed by each of the participants.

Frontmatter

Gene Functional Prediction Using Clustering Methods for the Analysis of Tomato Microarray Data

Abstract

Molecular mechanisms of plant-pathogen interaction have been studied thoroughly because of its importance for crop production and food supply. This knowledge is a starting point in order to identify new and specific resistance genes by detecting similar expression patterns. Here we evaluate the usefulness of clustering and data-mining methods to group together known plant resistance genes based on expression profiles. We conduct clustering separately on P.infestans inoculated and not-inoculated tomatoes and conclude that conducting the analysis separately is important for each condition, because grouping is different reflecting a characteristic behavior of resistance genes in presence of the pathogen.

Liliana López-Kleine, José Romeo, Francisco Torres-Avilés

Analysis of Word Symmetries in Human Genomes Using Next-Generation Sequencing Data

Abstract

We investigate Chargaff’s second parity rule and its extensions in the human genome, and evaluate its statistical significance. This phenomenon has been previously investigated in the reference human genome, but this sequence does not represent a proper sampling of the human population. With the 1000 genomes project, we have data from next-generation sequencing of different human individuals, constituting a sample of 1092 individuals. We explore and analyze this new type of data to evaluate the phenomenon of symmetry globally and for pairs of symmetric words.

Our methodology is based on measurements, traditional statistical tests and equivalence statistical tests using different parameters (e.g. mean, correlation coefficient).

We find that the global symmetries phenomenon is significant for word lengths smaller than 8. However, even when the global symmetry is significant, some symmetric word pairs do not present a significant positive correlation but a small or non positive correlation.

Vera Afreixo, João M. O. S. Rodrigues, Sara P. Garcia

A Clustering Framework Applied to DNA Microarray Data

Abstract

This paper presents a case study to show the competence of our evolutionary framework for cluster analysis of DNA microarray data. The proposed framework joins a genetic algorithm for hierarchical clustering with a set of visual components of cluster tasks given by a tool. The cluster visualization tool allows us to display different views of clustering results as a means of cluster visual validation. The results of the genetic algorithm for clustering have shown that it can find better solutions than the other methods for the selected data set. Thus, this shows the reliability of the proposed framework.

José A. Castellanos-Garzón, Fernando Díaz

Segmentation of DNA into Coding and Noncoding Regions Based on Inter-STOP Symbols Distances

Abstract

In this study we set to explore the potentialities of the inter-genomic symbols distance for finding the coding regions in DNA sequences. We use the distance between STOP symbols in the DNA sequence and a chi-square statistic to evaluate the nonhomogeneity of the three possible reading frames. The results of this exploratory study suggest that inter-STOP symbols distance has strong ability to discriminate coding regions.

Carlos A. C. Bastos, Vera Afreixo, Sara P. Garcia, Armando J. Pinho

Assignment of Novel Functions to Helicobacter pylori 26695’s Genome

Abstract

Helicobacter pylori is a pathogenic bacterium that colonizes the human epithelia, causing duodenal and gastric ulcers as well as gastric cancer. The genome of H. pylori 26695 has been sequenced and annotated. In addition, two genome-scale metabolic models have been developed. In order to maintain accurate and relevant information on coding sequences (CDS) and to retrieve new information, the assignment of new functions to Helicobacter pylori 26695’s genes was performed. The use of software tools, on-line databases and an annotation pipeline for inspecting each gene allowed the attribution of validated E.C. numbers to metabolic genes, and the assignment of 177 new functions to the CDS of this bacterium. This information provides relevant biological information for the scientific community dealing with this organism and can be used as the basis for a new metabolic model reconstruction.

Tiago Resende, Daniela M. Correia, Isabel Rocha

Analysing Quality Measures of Phasing Algorithms in Genome-Wide Haplotyping

Abstract

Inferring haplotype phase is a subject of interest, and given the increasing number of GWAS and related applications, like genetic profiling, it is necessary for phasing algorithms to provide a good performance even when processing large DNA fragments. Most of studies focus on genomic regions of limited length, therefore, we propose to test the most common statistics with genetic regions of variable length. We have found that one of the most used, the switch error, is insufficient when considering long distances: it converges to a constant value which does not truly shows the quality of the inferred phase. Furthermore, the IGP (incorrect genotype percentage) is a much more precise measure of the quality of the algorithm. New phasing algorithms should not care only about the number of switches, because in some cases (classifiers to assess genetic risks, for example) is important to distinguish the haplotype of each parent to obtain better results.

Sergio Torres-Sánchez, Manuel García-Sánchez, Nuria Medina-Medina, María Mar Abad-Grau

Search Functional Annotations Genetic Relationships of Coffee through Bio2RDF

Abstract

The protein sequence analysis can deal with various approaches in order to find the phenotypic and functional characteristics of the gene structure. Fortunately there are many models of genes already described in database with biological information.

Linking coffee gene annotations transcriptome to centralized biology information systems as Bio2RDF, offers the possibility of finding associated transcriptome relationships between them and between terms and concepts defined by semantic rules defined by ontologies.

Sesame was used as a repository to store information related triplets with coffee and transcriptomes index obtained from the Protein Data Bank (PDB), these relationships are the foundation for semantic search using SPARQL.

Data from the functional relationships searches are deployed through the endpoint provided by the repository from Sesame and Pubby.

Luis Bertel-Paternina, Luis F. Castillo, Alvaro Gaitán-Bustamente, Narmer Galeano-Vanegas, Gustavo Isaza

A Cellular Automaton Model of the Effects of Maspin on Cell Migration

Abstract

Maspin (Mammary Serine Protease Inhibitor) is a non-inhibitory serpin with multiple cellular effects that is a type II tumour metastasis suppressor. Maspin has been shown to reduce cell migration, invasion, proliferation and angiogenesis, and increase apoptosis and adhesion. In this paper, we report the development of a mathematical model of the effects of maspin on cellular proliferation and migration. An artificial neural network has been used to model the unknown cell signalling to determine the cells fate. Results show that maspin reduces migration by between 10-35%; confirmed by published in vitro data. From our knowledge, this is the first attempt to model maspin effects in a computational model to verify in vitro data. This will provide new insights into to the tumour suppressive properties of maspin and inform the development of novel cancer therapy.

M. A. Al-Mamun, M. A. Hossain, M. S. Alam, R. Bass

Quantitative Characterization of Protein Networks of the Oral Cavity

Abstract

Modeling protein interactions as complex networks allow applying graph theory to help understanding their topology, to validate previous evidences and to uncover new biological associations. Topological properties have been recognized by their contribution for the understanding of the structures, functional relationships and evolution of complex networks, helping in a better comprehension of the diseases mechanisms and in the identification of drug targets. The human interactome, i.e. the network formed by all protein-protein interactions, is a complex and yet unknown system.

In this paper we present the results of a study about the topological properties of the oral protein network. We evaluate several confidence scores and prediction methods, in order to compare these networks with random organizations with the same size.

Fernanda Correia Barbosa, Joel P. Arrais, José Luís Oliveira

Analysing Relevant Diseases from Iberian Tweets

Abstract

The Internet constitutes a huge source of information that can be exploited by individuals in many different ways. With the increasing use of social networks and blogs, the Internet is now used not only as an information source but also to disseminate personal health information. In this paper we exploit the wealth of user-generated data, available through the micro-blogging service Twitter, to estimate and track the incidence of health conditions in society, specifically in Portugal and Spain. We present results for the acquisition of relevant tweets for a set of four different conditions (flu, depression, pregnancy and eating disorders) and for the binary classification of these tweets as relevant or not for each case. The results obtained, ranging in AUC from 0.7 to 0.87, are very promising and indicate that such approach provides a feasible solution for measuring and tracking the evolution of many health related aspects within the society.

Víctor M. Prieto, Sergio Matos, Manuel Álvarez, Fidel Cacheda, José Luís Oliveira

Optimized Workflow for the Healthcare Logistic: Case of the Pediatric Emergency Department

Abstract

The Emergency Department (ED) in a hospital, as its name implies, is a facility to be utilized by those who require emergency medical care. This paper introduces the longitudinal organization of the patient handling” in the Pediatric Emergency called the “Pediatric Emergency Path”. This work discusses the usability of the workflow approach in order to design the patient path in the Pediatric Emergency Department (PED) in order to thwart the care complexity scheme. The goal is to optimize these paths to improve the quality of the patient handling while mastering the wait time. The development of this model was based on accurate visits made in the PED of the Regional University Hospital Center (CHRU) of Lille (France). This modeling, which has to represent most faithfully possible the reality of the PED of CHRU of Lille, is necessary. It must be enough retailed to produce an analysis allowing to identify the dysfunctions of the PED and also to propose and to estimate prevention indicators of tensions. Our survey is integrated into the French National Research Agency project, titled: “Hospital: optimization, simulation and avoidance of strain” (ANR HOST).

Inès Ajmi, Hayfa Zgaya, Slim Hammadi

Boosting the Detection of Transposable Elements Using Machine Learning

Abstract

Transposable Elements (TE) are sequences of DNA that move and transpose within a genome. TEs, as mutation agents, are quite important for their role in both genome alteration diseases and on species evolution. Several tools have been developed to discover and annotate TEs but no single one achieves good results on all different types of TEs. In this paper we evaluate the performance of several TEs detection and annotation tools and investigate if Machine Learning techniques can be used to improve their overall detection accuracy. The results of an in silico evaluation of TEs detection and annotation tools indicate that their performance can be improved by using machine learning classifiers.

Tiago Loureiro, Rui Camacho, Jorge Vieira, Nuno A. Fonseca

On an Individual-Based Model for Infectious Disease Outbreaks

Abstract

The mathematical modelling of infectious diseases is a large research area with a wide literature. In the recent past, most of the scientific contributions focused on compartmental models. However, the increasing computing power is pushing towards the development of individual models that consider the disease transmission and evolution at a very fine-grained level. In the paper, the authors give a short state of the art of compartmental models, summarise one of the most know individual models, and describe a generalization and a simulation algorithm.

Pierpaolo Vittorini, Ferdinando di Orio

A Statistical Comparison of SimTandem with State-of-the-Art Peptide Identification Tools

Abstract

The similarity search in theoretical mass spectra generated from protein sequence databases is a widely accepted approach for identification of peptides from query mass spectra generated by shotgun proteomics. Since query spectra contain many inaccuracies and the sizes of databases grow rapidly in recent years, demands on more accurate mass spectra similarities and on the utilization of database indexing techniques are still desirable. We propose a statistical comparison of parameterized Hausdorff distance with freely available tools OMSSA, X!Tandem and with the cosine similarity. We show that a precursor mass filter in combination with a modification of previously proposed parameterized Hausdorff distance outperforms state-of-the-art tools in both – the speed of search and the number of identified peptide sequences (even though the q-value is only 0.001). Our method is implemented in the freely available application SimTandem which can be used in the framework TOPP based on OpenMS.

Jiří Novák, Timo Sachsenberg, David Hoksza, Tomáš Skopal, Oliver Kohlbacher

HaptreeBuilder: Web Generation and Visualization of Risk Haplotype Trees

Abstract

The quantity and quality of genome-wide association studies for several diseases are constantly increasing. As a consequence, molecular biologists from different laboratories demand new visualization tools for them to explore results by view and formulate new conjectures to work on. Although nowadays most studies are not able to reconstruct individual haplotypes, the next generation sequencying technologies will allow to obtain individual haplotypes in most studies conducted in the next few years. As evolutionary analysis of the haplotypes can be an invaluable information to biomedical researchers to build hypotheses of genetic variation by considering haplotype evolution, we have build a web-based tool for biomedical researchers to build and visualize risk haplotype trees along a chromosome so that they can perform a visual online exploration of the genetic factors associated with complex diseases.

Dimitra Kamari, María Mar Abad-Grau, Fuencisla Matesanz

Speeding Up Phylogenetic Model Checking

Abstract

Model checking is a generic and formal technique that the authors have proposed for the study of properties that emerge from the biological labeling of the states defined over the phylogenetic tree [3] [10]. This strategy allows us to use generic software tools already present in the industry. However, the performance of traditional model checking is penalized when scaling the system for large phylogenies. To this end, two strategies are presented here. The first one consists of partitioning the phylogenetic tree into a set of related subproblems so as to speed up the computation time and distribute the memory consumption. The second strategy is based on uncoupling the information associated to each state of the phylogenetic tree (mainly, the DNA sequence) and exporting it to an external tool for the management of large information systems. The integration of all these approaches outperformed the results of monolithic model checking and helped us to execute the verification of properties in a real phylogenetic tree.

José Ignacio Requeno, José Manuel Colom

Comp2ROC

R Package to Compare Two ROC Curves

Abstract

This paper describes the Comp2ROC package implemented in the R programming language. The theoretical contextualization behind this package are introduced including a general introduction to ROC curves, their main features, and how ROC curves are applied. Furthermore, methodologies are presented for comparing two ROC curves, both when two curves intersect and when they do not. Finally, the paper explains how and when to use the Comp2ROC package to compare two ROC curves that intersect. An example application is shown and discussed to fully demonstrate how to use the facilities in this package.

Hugo Frade, A. C. Braga

Network Visualization Tools to Enhance Metabolic Engineering Platforms

Abstract

In this work, we present a software platform for the visualization of metabolic models, which is implemented as a plug-in for the open-source metabolic engineering (ME) platform OptFlux. The tools provided by this plug-in allow the visualization of the models (or parts of the models) combined with the results from operations applied over these models, mainly regarding phenotype simulation, strain optimization and pathway analysis. The tool provides a generic input/ output framework that can import/ export layouts from different formats used by other tools, namely XGMML and SBML. Thus, this work provides a bridge between network visualization and ME.

Alberto Noronha, Paulo Vilaça, Miguel Rocha

A Workflow for the Application of Biclustering to Mass Spectrometry Data

Abstract

Biclustering techniques have been successfully applied to analyze microarray data and they begin to be applied to the analysis of mass spectrometry data, a high-throughput technology for proteomic data analysis which has been an active research area during the last years. In this work, we propose a novel workflow to the application of biclustering to MALDI-TOF mass spectrometry data, supported by a software desktop application which covering all of its stages. We evaluate the adequacy of applying biclustering to analyze mass spectrometry by comparing between biclustering and hierarchical clustering over two real datasets. Results are promising since they revealed the ability of these techniques to extract useful information, opening a door to further works.

Hugo López-Fernández, Miguel Reboiro-Jato, Sara C. Madeira, Rubén López-Cortés, J. D. Nunes-Miranda, H. M. Santos, Florentino Fdez-Riverola, Daniel Glez-Peña

Backmatter

Title: 7th International Conference on Practical Applications of Computational Biology & Bioinformatics
Editors: Mohd Saberi Mohamad
Loris Nanni
Miguel P. Rocha
Florentino Fdez-Riverola
Publisher: Springer International Publishing
Electronic ISBN: 978-3-319-00578-2
Print ISBN: 978-3-319-00577-5
DOI: https://doi.org/10.1007/978-3-319-00578-2

Springer Professional

7th International Conference on Practical Applications of Computational Biology & Bioinformatics

About this book

Table of Contents

Frontmatter

Gene Functional Prediction Using Clustering Methods for the Analysis of Tomato Microarray Data

Analysis of Word Symmetries in Human Genomes Using Next-Generation Sequencing Data

A Clustering Framework Applied to DNA Microarray Data

Segmentation of DNA into Coding and Noncoding Regions Based on Inter-STOP Symbols Distances

Assignment of Novel Functions to Helicobacter pylori 26695’s Genome

Analysing Quality Measures of Phasing Algorithms in Genome-Wide Haplotyping

Search Functional Annotations Genetic Relationships of Coffee through Bio2RDF

A Cellular Automaton Model of the Effects of Maspin on Cell Migration

Quantitative Characterization of Protein Networks of the Oral Cavity

Analysing Relevant Diseases from Iberian Tweets

Optimized Workflow for the Healthcare Logistic: Case of the Pediatric Emergency Department

Boosting the Detection of Transposable Elements Using Machine Learning

On an Individual-Based Model for Infectious Disease Outbreaks

A Statistical Comparison of SimTandem with State-of-the-Art Peptide Identification Tools

HaptreeBuilder: Web Generation and Visualization of Risk Haplotype Trees

Speeding Up Phylogenetic Model Checking

Comp2ROC

Network Visualization Tools to Enhance Metabolic Engineering Platforms

A Workflow for the Application of Biclustering to Mass Spectrometry Data

Backmatter

Premium Partner