Skip to main content

Über dieses Buch

The growth in the Bioinformatics and Computational Biology fields over the last few years has been remarkable and the trend is to increase its pace. In fact, the need for computational techniques that can efficiently handle the huge amounts of data produced by the new experimental techniques in Biology is still increasing driven by new advances in Next Generation Sequencing, several types of the so called omics data and image acquisition, just to name a few. The analysis of the datasets that produces and its integration call for new algorithms and approaches from fields such as Databases, Statistics, Data Mining, Machine Learning, Optimization, Computer Science and Artificial Intelligence. Within this scenario of increasing data availability, Systems Biology has also been emerging as an alternative to the reductionist view that dominated biological research in the last decades. Indeed, Biology is more and more a science of information requiring tools from the computational sciences. In the last few years, we have seen the surge of a new generation of interdisciplinary scientists that have a strong background in the biological and computational sciences. In this context, the interaction of researchers from different scientific fields is, more than ever, of foremost importance boosting the research efforts in the field and contributing to the education of a new generation of Bioinformatics scientists. PACBB‘12 hopes to contribute to this effort promoting this fruitful interaction. PACBB'12 technical program included 32 papers from a submission pool of 61 papers spanning many different sub-fields in Bioinformatics and Computational Biology. Therefore, the conference will certainly have promoted the interaction of scientists from diverse research groups and with a distinct background (computer scientists, mathematicians, biologists). The scientific content will certainly be challenging and will promote the improvement of the work that is being developed by each of the participants.



Parallel Spectral Clustering for the Segmentation of cDNA Microarray Images

Microarray technology generates large amounts of expression level of genes to be analyzed simultaneously. This analysis implies microarray image segmentation to extract the quantitative information from spots. Spectral clustering is one of the most relevant unsupervised method able to gather data without a priori information on shapes or locality. We propose and test on microarray images a parallel strategy for the Spectral Clustering method based on domain decomposition and with a criterion to determine the number of clusters.

Sandrine Mouysset, Ronan Guivarch, Joseph Noailles, Daniel Ruiz

Prognostic Prediction Using Clinical Expression Time Series: Towards a Supervised Learning Approach Based on Meta-biclusters

Biclustering has been recognized as a remarkably effective method for discovering local temporal expression patterns and unraveling potential regulatory mechanisms, critical to understand complex biomedical processes, such as disease progression and drug response. In this work, we propose a classification approach based on meta-biclusters (a set of similar biclusters) applied to prognostic prediction. We use real clinical expression time series to predict the response of patients with multiple sclerosis to treatment with Interferon-


. The main advantages of this strategy are the interpretability of the results and the reduction of data dimensionality, due to biclustering. Preliminary results anticipate the possibility of recognizing the most promising genes and time points explaining different types of response profiles, according to clinical knowledge. The impact on the classification accuracy of different techniques for unsupervised discretization of the data is studied.

André V. Carreiro, Artur J. Ferreira, Mário A. T. Figueiredo, Sara C. Madeira

Parallel e-CCC-Biclustering: Mining Approximate Temporal Patterns in Gene Expression Time Series Using Parallel Biclustering

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from either microarray or RNAseq technologies, is critical to advance our understanding of complex biomedical processes such as growth, development, response to stimulus, disease progression and drug responses. In this paper, we propose parallel


-CCC-Biclustering, a parallel version of the state of the art


-CCC-Biclustering algorithm, an efficient exhaustive search biclustering algorithmto mine approximate temporal expression patterns. Parallel


-CCC-Biclustering implemented using functional programming and achieved a super-linear speed-up when compared to the original sequential algorithm in test cases using synthetic data.

Filipe Cristóvão, Sara C. Madeira

Identification of Regulatory Binding Sites on mRNA Using in Vivo Derived Informations and SVMs

Proteins able to interact with ribonucleic acids (RNA) are involved in many cellular processes. A detailed knowledge about the binding pairs is necessary to construct computational models which can avoid time consuming biological experiments. This paper addresses the creation of a model based on support vector machines and trained on experimentally validated data. The goal is the identification of RNA molecules binding specifically to a regulatory protein, called CELF1.

Carmen Maria Livi, Luc Paillard, Enrico Blanzieri, Yann Audic

Parameter Influence in Genetic Algorithm Optimization of Support Vector Machines

Support Vector Machines provide a well established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set. Feature selection and SVM parameters optimization methods improve classification accuracy. This paper studies their joint optimization and attribution improvement. A comparison was made using genetic algorithms to find the best parameters for SVM classification. Results show that using the RBF kernel returns better results on average, though the best optimization for some data sets is highly dependent on the choice of parameters and kernels. We also show that, overall, an average 26% relative improvement with 8% std was obtained.

Paulo Gaspar, Jaime Carbonell, José Luís Oliveira

Biological Knowledge Integration in DNA Microarray Gene Expression Classification Based on Rough Set Theory

DNA microarrays have contributed to the exponential growth of genetic data from years. One of the possible applications of this large amount of gene expression data diagnosis of diseases like cancer using classification methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge in classification process using Rough Set Theory, making it more effective. In addition, the proposed model is able to indicate which part of biological knowledge has been used building the model and classifing new samples.

D. Calvo-Dmgz, J. F. Galvez, Daniel Glez-Peña, Florentino Fdez-Riverola

Quantitative Assessment of Estimation Approaches for Mining over Incomplete Data in Complex Biomedical Spaces: A Case Study on Cerebral Aneurysms

Biomedical data sources are typically compromised by fragmented data records. This incompleteness of data reduces the confidence gained from the application of mining algorithms. In this paper an approach to approximate missing data items is presented, which enables data mining processes to be applied on a larger data set. The proposed framework is based on a

case-based reasoning

infrastructure which is used to identify those data entries that are more appropriate to support the approximation of missing values. Moreover, the framework is evaluated in the context of a complex biomedical domain:

intracranial cerebral aneurysms

. The dataset used includes a wide diversity of advanced features obtained from clinical data, morphological analysis, and hemodynamic simulations. The best feature estimations achieved errors of only 7%. There are, however, large differences between the estimation accuracy achieved with different features.

Jesus Bisbal, Gerhard Engelbrecht, Alejandro F. Frangi

ASAP: An Automated System for Scientific Literature Search in PubMed Using Web Agents

In this paper we present ASAP - Automated Search with Agents in PubMed, a web-based service aiming to manage and automate scientific literature search in the PubMed database. The system allows the creation and management of web agents, specifically parameterized thematically and functionally, that automatically and periodically crawl the PubMed database, oriented to search and retrieve relevant results according the requirements provided by the user. The results, containing the publications list retrieved, are emailed to the agent owner weekly during the activity period programmed for the web agent. The ASAP service is devoted to help researchers, especially from the field of biomedicine and bioinformatics, in order to increase their productivity, and can be accessed at:

Carlos Carvalhal, Sérgio Deusdado, Leonel Deusdado

Case-Based Reasoning to Classify Endodontic Retreatments

Within the field of odontology, an analysis of the probability of success of endodontic retreatment facilitates the diagnostic and decision-making process of medical personnel. This study presents a case-based reasoning system that predicts the probability of success and failure of retreatments to avoid extraction. Different classifiers were applied during the reuse phase of the case-based reasoning process. The system was tested on a set of patients who received retreatments, and a set of variables considered to be of particular interest, were selected.

Livia Campo, Vicente Vera, Enrique Garcia, Juan F. De Paz, Juan M. Corchado

A Comparative Analysis of Balancing Techniques and Attribute Reduction Algorithms

In this study we analyze several data balancing techniques and attribute reduction algorithms and their impact over the information retrieval process. Specifically, we study its performance when used in biomedical text classification using Support Vector Machines (SVMs) based on Linear, Radial, Polynomial and Sigmoid kernels. From experiments on the TREC Genomics 2005 biomedical text public corpus we conclude that these techniques are necessary to improve the classification process. Kernels get some improvements about their results when attribute reduction algorithms were used.Moreover, if balancing techniques and attribute reduction algorithms are applied, results obtained with oversampling are better than subsampling.

R. Romero, E. L. Iglesias, L. Borrajo

Sliced Model Checking for Phylogenetic Analysis

Model checking provides a powerful and flexible formal framework to state and verify biological properties on phylogenies. However, current model checking techniques fail to scale up the big amount of biological information relevant to each state of the system. This fact motivates the development of novel cooperative algorithms and slicing techniques that distribute not the graph structure of a phylogenetic system but the state information contained in its nodes.

José Ignacio Requeno, Roberto Blanco, Gregorio de Miguel Casado, José Manuel Colom

PHYSER: An Algorithm to Detect Sequencing Errors from Phylogenetic Information

Sequencing errors can be difficult to detect due to the high rate of production of new data, which makes manual curation unfeasible. To address these shortcomings we have developed a phylogenetic inspired algorithm to assess the quality of new sequences given a related phylogeny. Its performance and efficiency have been evaluated with human mitochondrial DNA data.

Jorge Álvarez-Jarreta, Elvira Mayordomo, Eduardo Ruiz-Pesini

A Systematic Approach to the Interrogation and Sharing of Standardised Biofilm Signatures

The study of microorganism consortia, also known as biofilms, is associated to a number of applications in biotechnology, ecotechnology and clinical domains. A public repository on existing biofilm studies would aid in the design of new studies as well as promote collaborative and incremental work. However, bioinformatics approaches are hampered by the limited access to existing data. Scientific publications summarise the studies whilst results are kept in researchers’ private

ad hoc


Since the collection and ability to compare existing data is imperative to move forward in biofilm analysis, the present work has addressed the development of a systematic computer-amenable approach to biofilm data organisation and standardisation. A set of in-house studies involving pathogens and employing different state-of-the-art devices and methods of analysis was used to validate the approach. The approach is now supporting the activities of BiofOmics, a public repository on biofilm signatures (


Anália Lourenço, Andreia Ferreira, Maria Olivia Pereira, Nuno F. Azevedo

Visual Analysis Tool in Comparative Genomics

Detecting regions with mutations associated with different pathologies is an important step in selecting relevant genes, proteins or diseases. The corresponding information of the mutations and genes is distributed in different public sources and databases, so it is necessary to use systems that can contrast different sources and select conspicuous information. This work presents a visual analysis tool that automatically selects relevant segments and the associated genes or proteins that could determine different pathologies.

Juan F. De Paz, Carolina Zato, María Abáigar, Ana Rodríguez-Vicente, Rocío Benito, Jesús M. Hernández

From Networks to Trees

Phylogenetic networks are a useful way of displaying relationships between nucleotide or protein sequences. They diverge from phylogenetic trees as networks present cycles, several possible evolutionary histories of the sequences analysed, while a tree presents a single evolutionary relationship. Networks are especially useful in studying markers with a high level of homoplasy (same mutation happening more than once during evolution) like the control region of mitochondrial DNA (mtDNA), where the researcher does not need to compromise with a single explanation for the evolution suggested by the data. However in many instances, trees are required. One case where this happens is in the founder analysis methodology that aims at estimating migration times of human populations along history and prehistory. Currently, the founder analysis methodology implicates the creation of networks, from where a probable tree will be extracted by hand by the researcher, a time-consuming process, prone to errors and to the ambiguous decisions of the researcher. In order to automate the founder analysis methodology an algorithm that extracts a single probable tree from a network in a fast, systematic way is presented here.

Marco Alves, Joãd Alves, Rui Camacho, Pedro Soares, Luísa Pereira

Procedure for Detection of Membranes in Three-Dimensional Subcellular Density Maps

Electron tomography is the leading technique for visualizing the cell environment in molecular detail. Interpretation of the three-dimensional (3D) density maps is however hindered by different factors, such as noise and the crowding at the subcellular level. Although several approaches have been proposed to facilitate segmentation of the 3D structures, none has prevailed as a generic method and thus manual annotation is still a common choice in the field. In this work we introduce a novel procedure to detect membranes. These structures define the natural limits of compartments within biological specimens. Therefore, its detection turns out to be a step towards automated segmentation. Our method is based on local differential structure and on a Gaussian-like membrane model. We have tested our procedure on tomograms obtained under different experimental conditions.

A. Martinez-Sanchez, I. Garcia, J. J. Fernandez

A Cellular Automaton Model for Tumor Growth Simulation

We used cellular automata for simulating tumor growth in a multicellular system. Cells have a genome associated with different cancer hallmarks, indicating if those are activated as consequence of mutations. The presence of the cancer hallmarks defines cell states and cell mitotic behaviors. These hallmarks are associated with a series of parameters, and depending on their values and the activation of the hallmarks in each of the cells, the system can evolve to different dynamics. We focus here on how the cellular automata simulating tool can provide a model of the tumor growth behavior in different conditions.

Ángel Monteagudo, José Santos

Ectopic Foci Study on the Crest Terminalis in 3D Computer Model of Human Atrial

Atrial fibrillation (AF) is the most common arrhythmia in clinical practice. Epidemiological studies show that AF tends to persist over time, creating electrophysiological and anatomical changes called remodeled atrial. It has been shown that these changes result in variations in conduction velocity (CV) in the atrial tissue. The changes caused by electrical remodeling in a model of action potential (AP) of atrial myocytes have been incorpotated in this study, coupled with an anatomically realistic three-dimensional model of human dilated atrium. Simulations of the spread of AP in terms of anatomical and electrical remodeling and remodeling of gap junctions were measured vulnerable windows of reentry generation on the crest terminalis of the atrium. The results obtained indicate that vulnerable window in the remodeling of gap junctions shifted 38 ms with respect to the model dilated, which shows the impact of structural remodeling Several types of permanent reentry of figures in form of eight and in form of rotor, favored by the underlying anatomy of the atrium were obtained.

Carlos A. Ruiz-Villa, Andrés P. Castaño, Andrés Castillo, Elvio Heidenreich

SAD_BaSe: A Blood Bank Data Analysis Software

The main goal of this project was to build a Web-based information system – SAD_BaSe – that monitors blood donations and the blood production chain in a user-friendly way. In particular, the system keeps track of several data indicators and supports their analysis, enabling the definition of collection and production strategies and, the measurement of quality indicators required by the Quality Management System of blood establishments. Data mining supports the analysis of donor eligibility criteria.

Augusto Ramoa, Salomé Maia, Anália Lourenço

A Rare Disease Patient Manager

The personal health implications behind rare diseases are seldom considered in widespread medical care. The low incidence rate and complex treatment process makes rare disease research an underrated field in the life sciences. However, it is in these particular conditions that the strongest relations between genotypes and phenotypes are identified. The rare disease patient manager, detailed in this manuscript, presents an innovative perspective for a patient-centric portal integrating genetic and medical data. With this strategy, patient’s digital records are transparently integrated and connected to wet-lab genetics research in a seamless working environment. The resulting knowledge base offers multiple data views, geared towards medical staff, with patient treatment and monitoring data; genetics researchers, through a custom locus-specific database; and patients, who for once play an active role in their treatment and rare diseases research.

Pedro Lopes, Rafael Mendonça, Hugo Rocha, Jorge Oliveira, Laura Vilarinho, Rosário Santos, José Luís Oliveira

MorphoCol: A Powerful Tool for the Clinical Profiling of Pathogenic Bacteria

Pathogenicity, virulence and resistance of infection-causing bacteria are noteworthy problems in clinical settings, even after disinfection practices and antibiotic courses. Although it is common knowledge that these traits are associated to phenotypic and genetic variations, recent studies indicate that colony morphology variations are a sign of increased bacterial resistance to antimicrobial agents (i.e. antibiotics and disinfectants) and altered virulence and persistence.

The ability to search for and compare similar phenotypic appearances within and across species is believed to have vast potential in medical diagnose and clinical decision making. Therefore, we are developing a novel phenotypic ontology, the Colony Morphology Ontology (CMO), to share knowledge on the colony morphology variations of infection-causing bacteria. A study on the morphological variations of Pseudomonas aeruginosa and Staphylococcus aureus strains, two pathogenic bacteria associated with nosocomial infections, supported the development of CMO. We are also developing a new Web-based framework for the modelling and analysis of biofilm phenotypic signatures, supported by the CMO. This framework, named MorphoCol, will enable data integration and interoperability across research groups and other biological databases.

Ana Margarida Sous, Anália Lourenço, Maria Olívia Pereira

Applying AIBench Framework to Develop Rich User Interfaces in NGS Studies

AIBench is a Java application framework for building translational software in Biomedicine. In this paper, we show how to use this framework to develop rich user interfaces in next-generation sequencing (NGS) experiments. In particular, we present PileLineGUI, a desktop environment for handling genome position files in next-generation studies based on the PileLine command-line toolbox.

Hugo López-Fernández, Daniel Glez-Peña, Miguel Reboiro-Jato, Gonzalo Gómez-López, David G. Pisano, Florentino Fdez-Riverola

Comparing Bowtie and BWA to Align Short Reads from a RNA-Seq Experiment

High-throughput sequencing technologies are a significant innovation that can contribute to important advances in genetic research. In recent years, many algorithms have been developed to align the large number of short nucleotide sequences generated by these technologies. Choosing within the available alignment algorithms is difficult; to assist this decision we evaluate several algorithms for the mapping of RNA-Seq data. The comparison was completed in two phases. An initial phase narrowed down the comparison to the three algorithms implemented in the tools: ELAND, Bowtie and BWA. A second phase compared the tools in terms of runtime, alignment coverage and process control.

N. Medina-Medina, A. Broka, S. Lacey, H. Lin, E. S. Klings, C. T. Baldwin, M. H. Steinberg, P. Sebastiani

SAMasGC: Sequencing Analysis with a Multiagent System and Grid Computing

Advances in bioinformatics have contributed towards a significant increase in available information. Information analysis requires the use of distributed computing systems to best engage the process of data analysis. This study proposes a multiagent system that incorporates grid technology to facilitate distributed data analysis by dynamically incorporating the roles associated to each specific case study. The system was applied to genetic sequencing data to extract relevant information about insertions, deletions or polymorphisms.

Roberto González, Carolina Zato, Rocío Benito, María Hernández, Jesús M. Hernández, Juan F. De Paz

Exon: A Web-Based Software Toolkit for DNA Sequence Analysis

Recent advances in DNA sequencing methodologies have caused an exponential growth of publicly available genomic sequence data. By consequence, many computational biologists have intensified studies in order to understand the content of these sequences and, in some cases, to search for association to disease. However, the lack of public available tools is an issue, specially when related to efficiency and usability. In this paper, we present Exon, a user-friendly solution containing tools for online analysis of DNA sequences through compression based profiles.

Diogo Pratas, Armando J. Pinho, Sara P. Garcia

On the Development of a Pipeline for the Automatic Detection of Positively Selected Sites

In this paper we present the ADOPS (Automatic Detection Of Positively Selected Sites) software that is ideal for research projects involving the analysis of tens of genes. ADOPS is a novel software pipeline that is being implemented with the goal of providing an automatic and flexible tool for detecting positively selected sites given a set of unaligned nucleotide sequence data.

David Reboiro-Jato, Miguel Reboiro-Jato, Florentino Fdez-Riverola, Nuno A. Fonseca, Jorge Vieira

Compact Representation of Biological Sequences Using Set Decision Diagrams

Nowadays, the exponential availability of biological sequences and the complexity of the computational methods that use them as input motivate the research of new compact representations. To this end, we propose an alternative method for storing sets of sequences based on set decision diagrams instead of classical compression techniques. The set decision diagrams are an extension of the reduced ordered binary decision diagrams, a graph data structure used as a symbolic compact representation of sets or relations between sets. Some experiments with genes of the mitochondrion DNA support the feasibility of our approach.

José Ignacio Requeno, José Manuel Colom

Computational Tools for Strain Optimization by Adding Reactions

This paper introduces a new plug-in for the OptFlux Metabolic Engineering platform, aimed at finding suitable sets of reactions to add to the genomes of microbes (wild type strain), as well as finding complementary sets of deletions, so that the mutant becomes able to overproduce compounds with industrial interest, while preserving their viability. The optimization methods used are Evolutionary Algorithms and Simulated Annealing. The usefulness of this plug-in is demonstrated by a case study, regarding the production of vanillin by the bacterium

E. coli


Sara Correia, Miguel Rocha

Computational Tools for Strain Optimization by Tuning the Optimal Level of Gene Expression

In this work, a plug-in for the OptFlux Metabolic Engineering platform is presented, implementing methods that allow the identification of sets of genes to over/under express, relatively to their wild type levels. The optimization methods used are Simulated Annealing and Evolutionary Algorithms, working with a novel representation and operators. This overcomes the limitations of previous approaches based solely on gene knockouts, bringing new avenues for Biotechnology, fostering the discovery of genetic manipulations able to increase the production of certain compounds using a host microbe. The plug-in is made freely available together with appropriate documentation.

Emanuel Gonçalves, Isabel Rocha, Miguel Rocha

Efficient Verification for Logical Models of Regulatory Networks

The logical framework allows for the qualitative analysis of complex regulatory networks that control cellular processes. However, the study of large models is still hampered by the combinatorial explosion of the number of states. In this manuscript we present our work to analyse logical models of regulatory networks using model checking techniques.We propose a symbolic encoding (using NuSMV) of logical regulatory graphs, also considering priority classes. To achieve a reduction of the state space, we further label the transitions with the values of the input components. This encoding has been implemented in the form of an export facility in GINsim, a software dedicated to logical models. The potential of our symbolic encoding is illustrated through the analysis of the segment-polarity module of the


embryo segmentation.

Pedro T. Monteiro, Claudine Chaouiya

Tackling Misleading Peptide Regulation Fold Changes in Quantitative Proteomics

Relative quantification in proteomics is a common strategy to analyze differences in biological samples and time series experiments. However, the resulting fold changes can give a wrong picture of the peptide amounts contained in the compared samples.

Fold changes hide the actual amounts of peptides. In addition posttranslational modifications can redistribute over multiple peptides, covering the same protein sequence, detected by mass spectrometry.

To circumvent these effects, a method was established to estimate the involved peptide amounts. The estimation of the theoretical peptide amount is based on the behavior of the peptide fold changes, in which lower peptide amounts are more susceptible to quantitative changes in a given sequence segment.

This method was successfully applied to a time-resolved analysis of growth receptor signaling in human prostate cancer cells. The theoretical peptide amounts show that high peptide fold changes can easily be nullified by the effects stated above.

Christoph Gernert, Evelin Berger, Frank Klawonn, Lothar Jänsch

Coffee Transcriptome Visualization Based on Functional Relationships among Gene Annotations

Simplified visualization and conformation of gene networks is one of the current bioinformatics challenges when thousands of gene models are being described in an organism genome. Bioinformatics tools such as BLAST and Interproscan build connections between sequences and potential biological functions through the search, alignment and annotation based on heuristic comparisons that make use of previous knowledge obtained from other sequences. This work describes the search procedure for functional relationships among a set of selected annotations, chosen by the quality of the sequence comparison as defined by the coverage, the identity and the length of the query, when coffee transcriptome sequences were compared against the reference databases UNIREF 100, Interpro, PDB and PFAM. Term descriptors for molecular biology and biochemistry were used along the wordnet dictionary in order to construct a Resource Description Framework (RDF) that enabled the finding of associations between annotations. Sequence-annotation relationships were graphically represented through a total of 6845 oriented vectors. A large gene network connecting transcripts by way of relational concepts was created with over 700 non-redundant annotations, that remain to be validated with biological activity data such as microarrays and RNAseq. This tool development facilitates the visualization of complex and abundant transcripotome data, opens the possibility to complement genomic information for data mining purposes and generates new knowledge in metabolic pathways analysis.

Luis F. Castillo, Oscar Gómez-Ramírez, Narmer Galeano-Vanegas, Luis Bertel-Paternina, Gustavo Isaza, Álvaro Gaitán-Bustamante


Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!