scroll identifier for mobile
main-content

## Über dieses Buch

This proceedings presents recent practical applications of Computational Biology and Bioinformatics. It contains the proceedings of the 9th International Conference on
Practical Applications of Computational Biology & Bioinformatics held at University of Salamanca, Spain, at June 3rd-5th, 2015. The International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB) is an annual international meeting dedicated to emerging and challenging applied research in Bioinformatics and Computational Biology. Biological and biomedical research are increasingly driven by experimental techniques that challenge our ability to analyse, process and extract meaningful knowledge from the underlying data. The impressive capabilities of next generation sequencing technologies, together with novel and ever evolving distinct types of omics data technologies, have put an increasingly complex set of challenges for the growing fields of Bioinformatics and Computational Biology. The analysis of the datasets produced and their integration call for new algorithms and approaches from fields such as Databases, Statistics, Data Mining, Machine Learning, Optimization, Computer Science and Artificial Intelligence. Clearly, Biology is more and more a science of information requiring tools from the computational sciences.

## Inhaltsverzeichnis

### A Preliminary Assessment of Three Strategies for the Agent-Based Modeling of Bacterial Conjugation

Bacterial conjugation is a cell-cell communication by which neighbor cells transmit circular DNA strands called plasmids. The transmission of these plasmids has been traditionally modeled using differential equations. Recently agent-based systems with spatial resolution have emerged as a promising tool that we use in this work to assess three different schemes for modeling the bacterial conjugation. The three schemes differ basically in which point of cell cycle the conjugation is most prone to happen. One alternative is to allow a conjugative event occurs as soon a suitable recipient is found, the second alternative is to make conjugation equally like to happen throughout the cell cycle and finally, the third one technique to assume that conjugation is more likely to occur in a specific point late in the cell cycle.
Antonio Prestes García, Alfonso Rodríguez-Patón

### Carotenoid Analysis of Cassava Genotypes Roots (Manihot Esculenta Crantz) Cultivated in Southern Brazil Using Chemometric Tools

Manihot esculenta roots rich in β-carotene are an important staple food for populations with risk of vitamin A deficiency. Cassava genotypes with high pro-vitamin A activity have been identified as a strategy to reduce the prevalence of deficiency of this vitamin, In this study, the metabolomics characterization focusing on the carotenoid composition of ten cassava genotypes cultivated in southern Brazil by UV-visible scanning spectrophotometry and reverse phase-high performance liquid chromatography was performed. The data set was used for the construction of a descriptive model by chemometric analysis. The genotypes of yellow roots were clustered by the higher concentrations of cis-β-carotene and lutein. Inversely, cream roots genotypes were grouped precisely due to their lower concentrations of these pigments, as samples rich in lycopene differed among the studied genotypes. The analytical approach (UV-Vis, HPLC, and chemometrics) used showed to be efficient for understanding the chemodiversity of cassava genotypes, allowing to classify them according to important features for human health and nutrition.
Rodolfo Moresco, Virgílio G. Uarrota, Aline Pereira, Maíra Tomazzoli, Eduardo da C. Nunes, Luiz Augusto Martins Peruch, Christopher Costa, Miguel Rocha, Marcelo Maraschin

### UV-Visible Scanning Spectrophotometry and Chemometric Analysis as Tools to Build Descriptive and Classification Models for Propolis from Southern Brazil

Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. Recent studies classified Brazilian propolis into 12 groups based on physiochemical characteristics and different botanical origins. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis’ chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. UV-Visible (UV-Vis) scanning spectrophotometry meets those prerequisites and was adopted, affording a spectral dataset containing the chemical profiles of hydroalcoholic extracts of sixty five propolis samples collected over the distinct seasons of year 2014, in southern Brazil. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA), by using bioinformatics tools supported by scripts written in the R language. The spectrophotometric profile approach associated with chemometric analyses allowed identifying a different pattern in samples of propolis produced during the summer season over the other seasons. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds (λ = 280−350 ηm), suggesting that besides the biological activities presented by those secondary metabolites, they are also relevant for the discrimination and classification of that complex matrix through bioinformatics tools.
Maíra M. Tomazzoli, Remi D. Pai Neto, Rodolfo Moresco, Larissa Westphal, Amélia R. S. Zeggio, Leandro Specht, Christopher Costa, Miguel Rocha, Marcelo Maraschin

### UV-Visible Spectrophotometry-Based Metabolomic Analysis of Cedrela Fissilis Velozzo (Meliaceae) Calluses - A Screening Tool for Culture Medium Composition and Cell Metabolic Profiles

In plant cell cultures aiming at the production of secondary metabolites of industrial interest, the culture medium composition is a decisive step for obtaining cell growth and high yields of the target compound(s). A rapid and reliable methodology for screening metabolic responses to medium composition is fundamental for the development of this biotechnological field. Following this approach, UV-Vis scanning spectrophotometry of callus extracts and their spectra pre-processing, univariate and multivariate analysis were tested in the present work. The results obtained successfully discriminated the culture media investigated and shed light on what metabolic pathways might be responsible for the differences among the callus cultures’ metabolic profiles.
Fernanda Kokowicz Pilatti, Christopher Costa, Miguel Rocha, Marcelo Maraschin, Ana Maria Viana

### An Integrated Computational Platform for Metabolomics Data Analysis

The field of metabolomics, one of the omics technologies that have recently revolutionized biological research, provides multiple challenges for data analysis, that have been addressed by several computational tools. However, none addresses the multiplicity of existing techniques and data analysis tasks. Here, we propose a novel R package that provides a set of functions for metabolomics data analysis, including data loading in different formats, pre-processing, univariate and multivariate data analysis, machine learning and feature selection. The package supports the analysis of data from the main experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment, promoting the rapid development and sharing of data analysis pipelines.
Christopher Costa, Marcelo Maraschin, Miguel Rocha

### Compound Identification in Comprehensive Gas Chromatography—Mass Spectrometry-Based Metabolomics by Blind Source Separation

Comprehensive gas chromatography - mass spectromety (GCxGC-MS) has become a promising tool in metabolomics. However, algorithms for GCxGC-MS data processing are needed in order to automatically process the data and extract the most pure information about the compounds appearing in the complex biological samples. This study shows the capability of orthogonal signal deconvolution (OSD), a novel algorithm based on blind source separation, to extract the spectra of the compounds appearing in GCxGC-MS samples. Results include a comparison between OSD and multivariate curve resolution - alternating least squares (MCR-ALS) with the extraction of metabolites spectra in a human serum sample analyzed through GCxGC-MS. This study concludes that OSD is a promising alternative for GCxGC-MS data processing.
Xavier Domingo-Almenara, Alexandre Perera, Noelia Ramírez, Jesus Brezmes

### Dolphin 1D: Improving Automation of Targeted Metabolomics in Multi-matrix Datasets of $$^1$$ 1 H-NMR Spectra

Nuclear magnetic resonance (NMR) is one of the main tools applied in the field of metabolomics. Extracting all the valuable information from large datasets of $$^1$$H-NMR spectra is a huge challenge for high throughput metabolomics analysis. The tools that currently exist to improve signal assignment and metabolite quantification do not have the versatility of allowing the quantification of unknown signals or choosing different quantification approaches in the same analysis. Moreover, graphical features and informative outputs are needed in order to be aware of the reliability of the final results in a field where position shifting, baseline masking and signal overlap may produce errors between samples. Here we present a software package called Dolphin 1D, which aim is to improve targeted metabolite analysis in large datasets of $$^1$$H-NMR by combining user interactivity with automatic algorithms. Its performance has been tested on a multi-matrix set composed by total serum, urine, liver aqueous extracts and brain aqueous extracts of rat. Our strategy pretends to offer a useful solution for every kind of matrix, avoiding black-box processes and subjectivities user-user in automatic signal quantification.
Josep Gómez, Maria Vinaixa, Miguel A. Rodríguez, Reza M. Salek, Xavier Correig, Nicolau Cañellas

### A New Dimensionality Reduction Technique Based on HMM for Boosting Document Classification

Many classification problems, such as text classification, require the ability to handle the high dimension of a structured representation of the documents. The enormous size of the data would result in burdensome computations. Consequently, there is a strong need for reducing the quantity of handled information to develop the classification process. In this paper, we propose a dimensionality reduction technique on text datasets based on a clustering method to group documents with a simple Hidden Markov Model to represent them. We have applied the new method on the OHSUMED benchmark text corpora using the $$k$$-NN and SVM classifiers. The results obtained are very satisfactory and demonstrate the suitability of the proposed technique for the problem of dimensionality reduction and document classification.
A. Seara Vieira, E. L. Iglesias, L. Borrajo

### Diagnostic Knowledge Extraction from MedlinePlus: An Application for Infectious Diseases

In the creation of diagnostic decision support systems (DDSS) it is crucial to have validated and precise knowledge in order to create accurate systems. Typically, medical experts are the source of this knowledge, but it is not always possible to obtain all the desired information from them. Another valuable source could be medical books or articles describing the diagnosis of diseases managed by the DDSS, but again, it is not easy to extract this information. In this paper we present the results of our research, in which we have used Web scraping and a combination of natural language processing techniques to extract diagnostic criteria from MedlinePlus articles about infectious diseases.
Alejandro Rodríguez-González, Marcos Martínez-Romero, Roberto Costumero, Mark D. Wilkinson, Ernestina Menasalvas-Ruiz

### A Text Mining Approach for the Extraction of Kinetic Information from Literature

Systems biology has fostered interest in the use of kinetic models to better understand the dynamic behavior of metabolic networks in a wide variety of conditions. Unfortunately, in most cases, data available in different databases are not sufficient for the development of such models, since a significant part of the relevant information is still scattered in the literature. Thus, it becomes essential to develop specific and powerful text mining tools towards this aim. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The pipeline proposed integrates the development of a novel plug-in over the text mining tool @Note2. Overall, the results validate the developed approach.
Ana Alão Freitas, Hugo Costa, Miguel Rocha, Isabel Rocha

### A Novel Search Engine Supporting Specific Drug Queries and Literature Management

The growing concern for acquired microbial resistance is promoting the publication of a large number of clinical and biological antimicrobial studies. Most of these publications can be obtained by searching the PubMed database, but the broad scope and huge size of this collection make the search challenging, and time consuming. This paper presents an advanced search engine for the screening of up-to-date information on drug-related experimental studies. The main contributions lay on the resource-oriented architecture and the semantic analysis of the documents. The RESTful API enables the use of the searchable collection by different user interfaces whereas text mining tools support domain-specific document labeling, scoring and indexing. A small search engine demo indexing articles on antimicrobial peptide research is available at http://​sing.​ei.​uvigo.​es/​sds/​. The source code is also accessible from the same homepage and freely available under MIT License.
Alberto G. Jácome, Florentino Fdez-Riverola, Anália Lourenço

### Identification of a Putative Ganoderic Acid Pathway Enzyme in a Ganoderma Australe Transcriptome by Means of a Hidden Markov Model

Ganoderma australe is a fungus widely used as a traditional medicine mainly in Eastern countries, but not studied in silico at the genomic level. This species is probably related to other well characterized fungus with similar properties, which may facilitate gene finding through comparative molecular analysis using appropriated bioinformatics tools. This paper aims to present a preliminary analysis of a G. australe transcriptome through some computational biology techniques implementing Hidden Markov Models (HMM) in order to predict a key putative enzyme (lanosterol synthase, EC 5.4.99.7) involved in the metabolic pathway of triterpenoids of therapeutic interest. The findings suggest that the HMM approach results more efficient than traditional comparisons by homology based on methods of multiple sequences alignment. Here we report the first evidence of a putative lanosterol synthase protein being expressed in cell cultures of G. australe.
Germán López-Gartner, Daniel Agudelo-Valencia, Sergio Castaño, Gustavo A. Isaza, Luis F. Castillo, Mariana Sánchez, Jeferson Arango

### A New Bioinformatic Pipeline to Address the Most Common Requirements in RNA-seq Data Analysis

Many bioinformatic programs have been developed to analyze data from RNA-seq experiments. These programs are widely used and often included in computational pipelines. Nevertheless, there does not seem to be a precise definition of what constitutes a proper workflow for this kind of data. We present here a new workflow that takes into account the most common requirements for RNA-seq analysis, and that is implemented as an automatic pipeline to perform an efficient and complete evaluation.
Osvaldo Graña, Miriam Rubio-Camarillo, Florentino Fdez-Riverola, David G. Pisano, Daniel Glez-Peña

### Microarray Gene Expression Data Integration: An Application to Brain Tumor Grade Determination

World Health Organization ranks brain tumors in four stages, being the fourth grade the most aggressive. Glioblastoma, a fourth grade tumor, is one of the most severe human diseases that almost inevitability leads to death. Physicians address the classification in grades through direct inspection. Indeed, there is a need for good automatic predictors of tumor grade, which are not affected by human misclassification errors and that can be made with less invasive diagnostic tools. This work address the stages involved in the process of selecting a good tumor grade predictor, based on microarray gene expression data. In this work, the information integration from heterogeneous platforms is highlighted, evidencing the particularities of choosing approaches working at gene, transcript or probeset levels. Distinct machine learning algorithms and integration methods are tested, analyzing their ability to produce a good set of predictors for tumor grade.
Eduardo Valente, Miguel Rocha

### Obtaining Relevant Genes by Analysis of Expression Arrays with a Multi-agent System

Triple negative breast cancer (TNBC) is an aggressive form of breast cancer. Despite treatment with chemotherapy, relapses are frequent and response to these treatments is not the same in younger women as in older women. Therefore, the identification of genes that provoke this disease is required, as well as the identification of therapeutic targets. There are currently different hybridization techniques, such as expression arrays, which measure the signal expression of both the genomic and transcriptomic levels of thousands of genes of a given sample. Probesets of Gene 1.0 ST GeneChip arrays provide the ultimate genome transcript coverage, providing a measurement of the expression level of the sample. This paper proposes a multi-agent system to manage information of expression arrays, with the goal of providing an intuitive system that is also extensible to analyze and interpret the results. The roles of agent integrate different types of techniques, from statistical and data mining techniques that select a set of genes, to search techniques that find pathways in which such genes participate, and information extraction techniques that apply a CBR system to check if these genes are involved in the disease.
Alfonso González, Juan Ramos, Juan F. De Paz, Juan M. Corchado

### Erratum to: Diagnostic Knowledge Extraction from MedlinePlus: An Application for Infectious Diseases

Alejandro Rodríguez-González, Marcos Martínez-Romero, Roberto Costumero, Mark D. Wilkinson, Ernestina Menasalvas-Ruiz

### Backmatter

Weitere Informationen

## BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

## Whitepaper

- ANZEIGE -

### Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.