Top

2014 | Book

Read chapter Read first chapter

Advances in Bioinformatics and Computational Biology

9th Brazilian Symposium on Bioinformatics, BSB 2014, Belo Horizonte, Brazil, October 28-30, 2014, Proceedings

Editor: Sérgio Campos

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 9th Brazilian Symposium on Bioinformatics, BSB 2014, held in Belo Horizonte, Brazil, in October 2014. The 18 revised full papers presented were carefully reviewed and selected from 32 submissions. The papers cover all aspects of bioinformatics and computational biology.

Frontmatter

An Extensible Framework for Genomic and Metagenomic Analysis

Abstract

Computational tools for supporting the management of scientific experiments are fundamental for the modern science. These tools must be easy to use, extensible and robust. This paper presents a framework for managing bioinformatics’ experiments, focusing on analysis of genomic and metagenomic data. The developed system is based on an extension of a scientific workflows management system combined with the development of specific tools related to genomic and metagenomic data analysis.

Luciano A. Digiampietri, Vivian M. Y. Pereira, Camila I. Costa, Geraldo J. dos Santos Júnior, Fernando M. Stefanini, Caio R. N. Santiago

On the Multichromosomal Hultman Number

Abstract

The number of cycles of a breakpoint graph is one of the notable parameters to solve distance problems in comparative genomics. For a fixed c, the number of linear unichromosomal genomes with n genes such that the breakpoint graph has c disjoint cycles, the Hultman number, is already determined. In this work we extend this result to multichromosomal genomes, providing formulas to compute the number of multichromosal genomes having a fixed number of cycles and/or paths.

Pedro Feijão, Fábio Viduani Martinez, Annelyse Thévenin

Towards an Ensemble Learning Strategy for Metagenomic Gene Prediction

Abstract

Metagenomics is an emerging field in which the power of genome analysis is applied to entire communities of microbes. A large variety of classifiers has been developed for gene prediction though there is lack of an empirical evaluation regarding the core machine learning techniques implemented in these tools. In this work we present an empirical performance evaluation of classification strategies for metagenomic gene prediction. This comparison takes into account distinct supervised learning strategies: one lazy learner, two eager-learners and one ensemble learner. Though the performance of the four base classifiers was good, the ensemble-based strategy with Random Forest has achieved the overall best result.

Fabiana Goés, Ronnie Alves, Leandro Corrêa, Cristian Chaparro, Lucinéia Thom

FUNN-MG: A Metagenomic Systems Biology Computational Framework

Abstract

Microorganisms abound everywhere. Though we know they play key roles in several ecosystems, too little is known about how these complex communities work. To act as a community they must interact with each other in order to achieve such community stability in which proper functions allows the microbial community to adapt in complex environment conditions. Thus, to effectively understand microbial genetic networks one needs to explore them by means of a systems biology approach. The proposed approach extends the metagenomic gene-centric view by taking into account the set of genes present in a metagenome and also the complex links of interactions among these genes and by treating the microbiome as a single biological system. In this paper, we present the FUNN-MG computational framework to explore functional modules in microbial genetic networks.

Leandro Corrêa, Ronnie Alves, Fabiana Goés, Cristian Chaparro, Lucinéia Thom

FluxMED: An Adaptable and Extensible Electronic Health Record System

Abstract

The amount of data generated by medical and laboratory services grows each day. The number of patients is increasing, modern examination methods generate large amounts of data and the growing specialization of the medical profession makes the problem of storing and managing this data very complex. Computer applications known as Laboratory Information Management Systems (LIMS) have been proposed as tools to address this issue. In this work we propose the FluxMED system, a fully customizable EHR system with an easy to adapt interface for data collection and retrieval. FluxMED can easily be customized to manage different types of medical data. The customization for a new disease can be done in a few hours with the help of a specialist. We have used FluxMED to manage data from patients of three complex diseases, neuromyelitis óptica, paracoccidioidomycosis and adreno-leukodistrofy. These diseases have very different symptoms, different exams are performed to come to a diagnostic and have different treatments. However, FluxMED is able to manage these data in a highly specialized manner without any modifications to its code.

Alessandra C. Faria-Campos, Lucas Hanke, Paulo Henrique Batista, Vinícius Garcia, Sérgio Campos

Influence of Sequence Length in Promoter Prediction Performance

Abstract

The advent of rapid evolution on sequencing capacity of new genomes has evidenced the need for data analysis automation aiming at speeding up the genomic annotation process and reducing its cost. Given that one important step for functional genomic annotation is the promoter identification, several studies have been taken in order to propose computational approaches to predict promoters. Different classifiers and characteristics of the promoter sequences have been used to deal with this prediction problem. However, several works in literature have addressed the promoter prediction problem using datasets containing sequences of 250 nucleotides or more. As the sequence length defines the amount of dataset attributes, even considering a limited number of properties to characterize the sequences, datasets with a high number of attributes are generated for training classifiers. Once high-dimensional datasets can degrade the classifiers predictive performance or even require an infesible processing time, predicting promoters by training classifiers from datasets with a reduced number of attributes, it is essential to obtain good predictive performance with low computational cost. To the best of our knowledge, there is no work in literature that verified in a sistematic way the relation between the sequences length and the predictive performance of classifiers. Thus, in this work, sixteen datasets composed of different sized sequences are built and evaluated using the SVM and k-NN classifiers. The experimental results show that several datasets composed of shorter sequences acheived better predictive performance when compared with datasets composed of longer sequences and consumed a significantly shorter processing time.

Sávio G. Carvalho, Renata Guerra-Sá, Luiz H. de C. Merschmann

Evolution of Genes Neighborhood within Reconciled Phylogenies: An Ensemble Approach

Abstract

We consider a recently introduced dynamic programming scheme to compute parsimonious evolutionary scenarios for gene adjacencies. We extend this scheme to sample evolutionary scenarios from the whole solution space under the Boltzmann distribution. We apply our algorithms to a dataset of mammalian gene trees and adjacencies, and observe a significant reduction of the number of syntenic inconsistencies observed in the resulting ancestral gene adjacencies.

Cedric Chauve, Yann Ponty, João Paulo Pereira Zanetti

Dynamic Programming for Set Data Types

Abstract

We present an efficient generalization of algebraic dynamic programming (ADP) to unordered data types and a formalism for the automated derivation of outside grammars from their inside progenitors. These theoretical contributions are illustrated by ADP-style algorithms for shortest Hamiltonian path problems. These arise naturally when asking whether the evolutionary history of an ancient gene cluster can be explained by a series of local tandem duplications. Our framework makes it easy to compute Maximum accuracy solutions, which in turn require the computation of the probabilities of individual edges in the ensemble of Hamiltonian paths. The expansion of the Hox gene clusters is investigated as a show-case application. For implementation details see http://www.bioinf.uni-leipzig.de/Software/setgram/

Christian Höner zu Siederdissen, Sonja J. Prohaska, Peter F. Stadler

Using Binary Decision Diagrams (BDDs) for Memory Optimization in Basic Local Alignment Search Tool (BLAST)

Abstract

Sequence alignment is the procedure of comparing two or more DNA or protein sequences in order to find similarities between them. One of the tools used for this purpose is the Basic Local Alignment Search Tool (BLAST). BLAST however, presents limits on the size of sequences that can be analyzed requiring the use of a lot of memory and time for long sequences. Therefore, improvements can be made to overcome these limitations. In this work we propose the use of the data structure Binary Decision Diagram (BDD) to represent alignments obtained through BLAST, which offers a compressed and efficient representation of the aligned sequences. We have developed a BDD-based version of BLAST, which omits any redundant information shared by the aligned sequences. We have observed a considerable improvement on memory usage, saving up to 63,95% memory, with a negligible performance degradation of only 3,10%. This approach could improve alignment methods, obtaining compact and efficient representations, which could allow the alignment of longer sequences, such as genome-wide human sequences, to be used in population and migration studies.

Demian Oliveira, Fernando Braz, Bruno Ferreira, Alessandra Faria-Campos, Sérgio Campos

A Multi-Objective Evolutionary Algorithm for Improving Multiple Sequence Alignments

Abstract

Multiple Sequence Alignments are essential tools for many tasks performed in molecular biology. This paper proposes an efficient, scalable and effective multi-objective evolutionary algorithm to optimize pre-aligned sequences. This algorithm benefits from the great diversity of state-of-the-art algorithms and produces alignments that do not depend on specific sequence features. The proposed method is validated with a database of refined multiple sequence alignments and uses four standard metrics to compare the quality of the results.

Wilson Soto, David Becerra

BION2SEL: An Ontology-Based Approach for the Selection of Molecular Biology Databases

Abstract

The catalogs of molecular biology databases does not provide a full description of databases, so the user should select databases using limited information available. Taking into account this fact, in the context of an initiative called BioDBCore, a group of experts proposes core metadata definitions to describe the molecular biology databases. However, how to use these metadata to infer the quality of a database is a clear open issue. In the present work, we propose an ontology-based approach aiming to guide the database selection process from molecular biology database catalogs using these metadata.

Daniel Lichtnow, Ronnie Alves, Oscar Pastor, Verónica Burriel, José Palazzo Moreira de Oliveira

Structural Comparative Analysis of Secreted NTPDase Models of Schistosoma mansoni and Homo sapiens

Abstract

The control of extracellular nucleoside concentrations by Nucleoside Triphosphate Diphosphohydrolase (NTPDase) is essential in the regulation of the purinergic signalling and also in immune response. In humans, eight members (HsNTPDase) were identified as transmembrane and secreted proteins. In Schistosoma mansoni, the causative agent of schistosomiasis, NTPDases similar to the humans enzymes have also been identified. The expression of these enzymes in S. mansoni (SmATPDases) is related to the weakening of the immune and inflammatory responses of the host against infections. Despite of the high phylogenetic conservation between these proteins, SmATPDases have been reported as molecular target candidates for antischistosomal treatment. In this work, we constructed three-dimensional models for secreted SmATPDase and HsNTPDase6, using comparative modeling technique. The comparative structural analysis aim the investigation of possible differences that could help future works in the development of new therapies that minimize the risk of cross inhibition.

Vinicius Carius de Souza, Vinicius Schmitz Nunes, Eveline Gomes Vasconcelos, Priscila Faria-Pinto, Priscila V. S. Z. Capriles

Length and Symmetry on the Sorting by Weighted Inversions Problem

Abstract

Large-scale mutational events that occur when stretches of DNA sequence move throughout genomes are called genome rearrangement events. In bacteria, inversions are one of the most frequently observed rearrangements. In some bacterial families, inversions are biased in favor of symmetry as shown by recent research [6, 8, 10]. In addition, several results suggest that short segment inversions are more frequent in the evolution of microbial genomes [4,6,15]. Despite the fact that symmetry and length of the reversed segments seem very important, they have not been considered together in any problem in the genome rearrangement field. Here, we define the problem of sorting genomes (or permutations) using inversions whose costs are assigned based on their lengths and asymmetries. We present five procedures and we assess these procedure performances on small sized permutations. The ideas presented in this paper provide insights to solve the problem and set the stage for a proper theoretical analysis.

Christian Baudet, Ulisses Dias, Zanoni Dias

Storage Policy for Genomic Data in Hybrid Federated Clouds

Abstract

Execution performance of bioinformatics workflows in cloud federated environments is strongly affected by data storage and retrieval, due to the large volumes of information in genomic sequences. This paper presents a storage policy for files used in a typical bioinformatics application with genomic data that aims to reduce their transfer time and then contribute to a faster execution of the workflow. We discuss a case study using the BioNimbuZ federated cloud platform. Our results show that this storage policy significantly improved times for transferring files, and thus lowered the total time to execute the workflow.

Ricardo Gallon, Maristela Holanda, Aletéia Araújo, Maria E. Walter

Genome-Wide Identification of Non-coding RNAs in Komagatella pastoris str. GS115

Abstract

The methylotrophic yeast Komagatella pastoris is a relevant bioengineering platform for protein synthesis. Even though non-coding RNAs are well known to be key players in the control of gene expression no comprehensive annotation of non-coding RNAs has been reported for this species. We combine here published RNA-seq data with a wide array of homology based annotation tools and de novo gene predictions to compile the non-coding RNAs in K. pastoris.

Hugo Schneider, Sebastian Bartschat, Gero Doose, Lucas Maciel, Erick Pizani, Marcelo Bassani, Fernando Araripe Torres, Sebastian Will, Tainá Raiol, Marcelo Brígido, Maria Emília Walter, Peter Stadler

Multi-scale Simulation of T Helper Lymphocyte Differentiation

Abstract

The complex differentiation process of the CD4+ T helper lymphocytes shapes the form and the range of the immune response to different antigenic challenges. Few mathematical and computational models have addressed this key phenomenon. We here present a multiscale approach in which two different levels of description, i.e. a gene regulatory network model and an agent-based simulator for cell population dynamics, are integrated into a single immune system model. We illustrate how such model integration allows bridging a gap between gene level information and cell level population, and how the model is able to describe a coherent immunological behaviour when challenged with different stimuli.

P. Tieri, V. Prana, T. Colombo, D. Santoni, F. Castiglione

Scaffolding of Ancient Contigs and Ancestral Reconstruction in a Phylogenetic Framework

Abstract

Ancestral genome reconstruction is an important step in analyzing the evolution of genomes. Recent progress in sequencing ancient DNA led to the publication of so-called paleogenomes and allows the integration of this sequencing data in genome evolution analysis. However, the assembly of ancient genomes is fragmented because of DNA degradation over time. Integrated phylogenetic assembly addresses the issue of genome fragmentation in the ancient DNA assembly while improving the reconstruction of all ancient genomes in the phylogeny. The fragmented assembly of the ancient genome can be represented as an assembly graph, indicating contradicting ordering information of contigs.

In this setting, our approach is to compare the ancient data with extant finished genomes. We generalize a reconstruction approach minimizing the Single-Cut-or-Join rearrangement distance towards multifurcating trees and include edge lengths to avoid a sparse reconstruction in practice. When also including the additional conflicting ancient DNA data, we can still ensure consistent reconstructed genomes.

Nina Luhmann, Cedric Chauve, Jens Stoye, Roland Wittler

Quality Metrics for Benchmarking Sequences Comparison Tools

Abstract

Comparing sequences is a daily task in bioinformatics and many software try to fulfill this need by proposing fast execution times and accurate results. Introducing a new software in this field requires to compare it to recognized tools with the help of well defined metrics. A set of quality metrics is proposed that enables a systematic approach for comparing alignment tools. These metrics have been implemented in a dedicated software, allowing to produce textual and graphical benchmark artifacts.

Erwan Drezen, Dominique Lavenier

Backmatter

Title: Advances in Bioinformatics and Computational Biology
Editor: Sérgio Campos
Publisher: Springer International Publishing
Electronic ISBN: 978-3-319-12418-6
Print ISBN: 978-3-319-12417-9
DOI: https://doi.org/10.1007/978-3-319-12418-6

Springer Professional

Advances in Bioinformatics and Computational Biology

9th Brazilian Symposium on Bioinformatics, BSB 2014, Belo Horizonte, Brazil, October 28-30, 2014, Proceedings

About this book

Table of Contents

Frontmatter

An Extensible Framework for Genomic and Metagenomic Analysis

On the Multichromosomal Hultman Number

Towards an Ensemble Learning Strategy for Metagenomic Gene Prediction

FUNN-MG: A Metagenomic Systems Biology Computational Framework

FluxMED: An Adaptable and Extensible Electronic Health Record System

Influence of Sequence Length in Promoter Prediction Performance

Evolution of Genes Neighborhood within Reconciled Phylogenies: An Ensemble Approach

Dynamic Programming for Set Data Types

Using Binary Decision Diagrams (BDDs) for Memory Optimization in Basic Local Alignment Search Tool (BLAST)

A Multi-Objective Evolutionary Algorithm for Improving Multiple Sequence Alignments

BION2SEL: An Ontology-Based Approach for the Selection of Molecular Biology Databases

Structural Comparative Analysis of Secreted NTPDase Models of Schistosoma mansoni and Homo sapiens

Length and Symmetry on the Sorting by Weighted Inversions Problem

Storage Policy for Genomic Data in Hybrid Federated Clouds

Genome-Wide Identification of Non-coding RNAs in Komagatella pastoris str. GS115

Multi-scale Simulation of T Helper Lymphocyte Differentiation

Scaffolding of Ancient Contigs and Ancestral Reconstruction in a Phylogenetic Framework

Quality Metrics for Benchmarking Sequences Comparison Tools

Backmatter

Premium Partner