nach oben

2015 | Buch

Kapitel lesen Erstes Kapitel lesen

Computational Intelligence Methods for Bioinformatics and Biostatistics

11th International Meeting, CIBB 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers

herausgegeben von: Prof. Clelia DI Serio, Dr. Pietro Liò, Alessandro Nonis, Prof. Roberto Tagliaferri

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the 11th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2014, held in Cambridge, UK, in June 2014.
The 25 revised full papers presented were carefully reviewed and selected from 44 submissions. The papers focus problems concerning computational techniques in bioinformatics, systems biology, medical informatics and biostatistics.

Inhaltsverzeichnis

Frontmatter

Erratum to: A New Feature Selection Methodology for K-mers Representation of DNA Sequences

Erratum to: Chapter “A New Feature Selection Methodology for K-mers Representation of DNA Sequences” in: C. di Serio et al. (Eds.): Computational Intelligence Methods for Bioinformatics and Biostatistics, LNCS, DOI: 10.1007/978-3-319-24462-4_9

The original version of this chapter contained an error. The names of the authors Giosuè Lo Bosco and Luca Pinello were inverted in the original publication. The original chapter was corrected.

Giosuè Lo Bosco, Luca Pinello

Regular Sessions

Frontmatter

GO-WAR: A Tool for Mining Weighted Association Rules from Gene Ontology Annotations

Abstract

The Gene Ontology (GO) is a controlled vocabulary of concepts (called GO Terms) structured on three main ontologies. Each GO Term contains a description of a biological concept that is associated to one or more gene products through a process also known as annotation. Each annotation may be derived using different methods and an Evidence Code (EC) takes into account of this process. The importance and the specificity of both GO terms and annotations are often measured by their Information Content (IC). Mining annotations and annotated data may extract meaningful knowledge from a biological stand point. For instance, the analysis of these annotated data using association rules provides evidence for the co-occurrence of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents a methodology for extracting Weighted Association Rules from GO implemented in a tool named GO-WAR (Gene Ontology-based Weighted Association Rules). It is able to extract association rules with a high level of IC without loss of Support and Confidence from a dataset of annotated data. A case study on using of GO WAR on publicly available GO annotation dataset is used to demonstrate that our method outperforms current state of the art approaches.

Giuseppe Agapito, Mario Cannataro, Pietro H. Guzzi, Marianna Milano

Extended Spearman and Kendall Coefficients for Gene Annotation List Correlation

Abstract

Gene annotations are a key concept in bioinformatics and computational methods able to predict them are a fundamental contribution to the field. Several machine learning algorithms are available in this domain; they include relevant parameters that might influence the output list of predicted gene annotations. The amount that the variation of these key parameters affect the output gene annotation lists remains an open aspect to be evaluated. Here, we provide support for such evaluation by introducing two list correlation measures; they are based on and extend the Spearman ρ correlation coefficient and Kendall τ distance, respectively. The application of these measures to some gene annotation lists, predicted from Gene Ontology annotation datasets of different organisms’ genes, showed interesting patterns between the predicted lists. Additionally, they allowed expressing some useful considerations about the prediction parameters and algorithms used.

Davide Chicco, Eleonora Ciceri, Marco Masseroli

Statistical Analysis of Protein Structural Features: Relationships and PCA Grouping

Abstract

Subtle structural differences among homologous proteins may be responsible of the modulation of their functional properties. Therefore, we are exploring novel and strengthened methods to investigate in deep protein structure, and to analyze conformational features, in order to highlight relationships to functional properties. We selected some protein families based on their different structural class from CATH database, and studied in detail many structural parameters for these proteins. Some valuable results from Pearson’s correlation matrix have been validated with a Student’s t‐distribution test at a significance level of 5% (p‐value). We investigated in detail the best relationships among parameters, by using partial correlation. Moreover, PCA technique has been used for both single family and all families, in order to demonstrate how to find outliers for a family and extract new combined features. The correctness of this approach was borne out by the agreement of our results with geometric and structural properties, known or expected. In addition, we found unknown relationships, which will be object of further studies, in order to consider them as putative markers related to the peculiar structure‐function relationships for each family.

E. Del Prete, S. Dotolo, A. Marabotti, A. Facchiano

Exploring the Relatedness of Gene Sets

Abstract

A key activity for life scientists is the exploration of the relatedness of a set of genes in order to differentiate genes performing coherently related functions from random grouped genes. This paper considers exploring the relatedness within two popular bio-organizations, namely gene families and pathways. This exploration is carried out by integrating different resources (ontologies, texts, expert classifications) and aims to suggest patterns that facilitate the biologists in obtaining a more comprehensive vision of differences in gene behaviour. Our approach is based on the annotation of a specialized corpus of texts (the gene summaries) that condense the description of functions/processes in which genes are involved. By annotating these summaries with different ontologies a set of descriptor terms is derived and compared in order to obtain a measure of relatedness within the bio-organizations we considered. Finally, the most important annotations within each family are extracted using a text categorization method.

Nicoletta Dessì, Stefania Dessì, Emanuele Pascariello, Barbara Pes

Consensus Clustering in Gene Expression

Abstract

In data analysis, clustering is the process of finding groups in unlabelled data according to similarities among them in such a way that data items belonging to the same group are more similar between each other than items in different groups. Consensus clustering is a methodology for combining different clustering solutions from the same data set in a new clustering, in order to obtain a more accurate and stable solution. In this work we compared different consensus approaches in combination with different clustering algorithms and ran several experiments on gene expression data sets. We show that consensus techniques lead to an improvement in clustering accuracy and give evidence of the stability of the solutions obtained with these methods.

Paola Galdi, Francesco Napolitano, Roberto Tagliaferri

Automated Detection of Fluorescent Probes in Molecular Imaging

Abstract

Complex biological features at the molecular, organelle and cellular levels, which were traditionally evaluated and quantified visually by a trained expert, are now subjected to computational analytics. The use of machine learning techniques allows one to extend the computational imaging approach by considering various markers based on DNA, mRNA, microRNA (miRNA) and proteins that could be used for classification of disease taxonomy, response to therapy and patient outcome. One method employed to investigate these markers is Fluorescent In Situ Hybridization (FISH). FISH employs probes designed to hybridise to specific sequences of DNA in order to display the locations of regions of interest. We have developed a method to identify individual interphase nuclei and record the positions of different coloured probes attached to chromatin regions within these nuclei. Our method could be used for obtaining information such as pairwise distances between probes and inferring properties of chromatin structure.

Fiona Kathryn Hamey, Yoli Shavit, Valdone Maciulyte, Christopher Town, Pietro Liò, Sabrina Tosi

Applications of Network-based Survival Analysis Methods for Pathways Detection in Cancer

Abstract

Gene expression data from high-throughput assays, such as microarray, are often used to predict cancer survival. Available datasets consist of a small number of samples (n patients) and a large number of genes (p predictors). Therefore, the main challenge is to cope with the high-dimensionality. Moreover, genes are co-regulated and their expression levels are expected to be highly correlated. In order to face these two issues, network based approaches can be applied. In our analysis, we compared the most recent network penalized Cox models for high-dimensional survival data aimed to determine pathway structures and biomarkers involved into cancer progression.

Using these network-based models, we show how to obtain a deeper understanding of the gene-regulatory networks and investigate the gene signatures related to prognosis and survival in different types of tumors. Comparisons are carried out on three real different cancer datasets.

Antonella Iuliano, Annalisa Occhipinti, Claudia Angelini, Italia De Feis, Pietro Lió

Improving Literature-Based Discovery with Advanced Text Mining

Abstract

Automated Literature Based Discovery (LBD) generates new knowledge by combining what is already known in literature. Facilitating large-scale hypothesis testing and generation from huge collections of literature, LBD could significantly support research in biomedical sciences. However, the uptake of LBD by the scientific community has been limited. One of the key reasons for this is the limited nature of existing LBD methodology. Based on fairly shallow methods, current LBD captures only some of the information available in literature. We discuss how advanced Text Mining based on Information retrieval, Natural Language Processing and data mining could open the doors to much deeper, wider coverage and dynamic LBD better capable of evolving with science, in particular when combined with sophisticated, state-of-the-art knowledge discovery techniques.

Anna Korhonen, Yufan Guo, Simon Baker, Meliha Yetisgen-Yildiz, Ulla Stenius, Masashi Narita, Pietro Liò

A New Feature Selection Methodology for K-mers Representation of DNA Sequences

Abstract

DNA sequence decomposition into k-mers and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compare sequences in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of a fixed length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence analysis. Moreover, the presence of possible noisy features can also affect the classification accuracy. In this paper we propose a feature selection method able to select the most informative k-mers associated to a set of DNA sequences. Such selection is based on the Motif Independent Measure (MIM), an unbiased quantitative measure for DNA sequence specificity that we have recently introduced in the literature. Results computed on public datasets show the effectiveness of the proposed feature selection method.

Giosuè Lo Bosco, Luca Pinello

Detecting Overlapping Protein Communities in Disease Networks

Abstract

In this work we propose a novel hybrid technique for overlapping community detection in biological networks able to exploit both the available quantitative and the semantic information, that we call Semantically Enriched Fuzzy C-Means Spectral Modularity (SE-FSM) community detection method. We applied SE-FSM in analyzing Protein-protein interactions (PPIs) networks of HIV-1 infection and Leukemia in Homo sapiens. SE-FSM found significant overlapping biological communities. In particular, it found a strong relationship between HIV-1 and Leukemia as their communities share several significant pathways, and biological functions.

Hassan Mahmoud, Francesco Masulli, Stefano Rovetta, Giuseppe Russo

Approximate Abelian Periods to Find Motifs in Biological Sequences

Abstract

A problem that has been gaining importance in recent years is that of computing the Abelian periods in a string. A string w has an Abelian period p if it is a sequence of permutations of a length–p string. In this paper, we define an approximate variant of Abelian periods which allows variations between adjacent elements of the sequence. Particularly, we compare two adjacent elements in the sequence using δ– and γ– metrics. We develop an algorithm for computing all the δγ–approximate Abelian periods in a string under two proposed definitions. We also show a preliminary application to the problem of identifying genes with periodic variations in their expression levels.

Juan Mendivelso, Camilo Pino, Luis F. Niño, Yoan Pinzón

Sem Best Shortest Paths for the Characterization of Differentially Expressed Genes

Abstract

In the last years, systems and computational biology focused their efforts in uncovering the causal relationships among the observable perturbations of gene regulatory networks and human diseases. This problem becomes even more challenging when network models and algorithms have to take into account slightly significant effects, caused by often peripheral or unknown genes that cooperatively cause the observed diseased phenotype. Many solutions, from community and pathway analysis to information flow simulation, have been proposed, with the aim of reproducing biological regulatory networks and cascades, directly from empirical data as gene expression microarray data. In this contribute, we propose a methodology to evaluate the most important shortest paths between differentially expressed genes in biological interaction networks, with absolutely no need of user-defined parameters or heuristic rules, enabling a free-of-bias discovery and overcoming common issues affecting the most recent network-based algorithms.

Daniele Pepe, Fernando Palluzzi, Mario Grassi

The General Regression Neural Network to Classify Barcode and mini-barcode DNA

Abstract

In the identification of living species through the analysis of their DNA sequences, the mitochondrial “cytochrome c oxidase subunit 1” (COI) gene has proved to be a good DNA barcode. Nevertheless, the quality of the full length barcode sequences often can not be guaranteed because of the DNA degradation in biological samples, so that only short sequences (mini-barcode) are available. In this paper, a prototype-based classification approach for the analysis of DNA barcode, exploiting a spectral representation of DNA sequences and a memory-based neural network, is proposed. The neural network is a modified version of General Regression Neural Network (GRNN) used as a classification tool. Furthermore, the relationship between the characteristics of different species and their spectral distribution is investigated. Namely, a subset of the whole spectrum of a DNA sequence, composed by very high frequency DNA k-mers, is considered providing a robust system for the classification of barcode sequences. The proposed approach is compared with standard classification algorithms, like Support Vector Machine (SVM), obtaining better results specially when applied to mini-barcode sequences.

Riccardo Rizzo, Antonino Fiannaca, Massimo La Rosa, Alfonso Urso

Transcriptator: Computational Pipeline to Annotate Transcripts and Assembled Reads from RNA-Seq Data

Abstract

RNA-Seq is a new tool, which utilizes high-throughput sequencing to measure RNA transcript counts at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data coming out from RNA-Seq into biological knowledge is a problem, and biologist-friendly tools to analyze them are lacking. In our lab, we develop a Transcriptator web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and gene ontology annotation enrichment. It enables a biologist to identify enriched biological themes, particularly Gene Ontology (GO) terms related to biological process, molecular functions and cellular locations. It clusters the transcripts based on functional annotation and generates a tabular report for functional and gene ontology annotation for every single transcript submitted to our web server. Implementation of QuickGo web-services in our pipeline enable users to carry out GO-Slim analysis. Finally, it generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely available at: www-labgtp.na.icar.cnr.it:8080/ Transcriptator.

Kumar Parijat Tripathi, Daniela Evangelista, Raffaele Cassandra, Mario R. Guarracino

Application of a New Ridge Estimator of the Inverse Covariance Matrix to the Reconstruction of Gene-Gene Interaction Networks

Abstract

A proper ridge estimator of the inverse covariance matrix is presented. We study the properties of this estimator in relation to other ridge-type estimators. In the context of Gaussian graphical modeling, we compare the proposed estimator to the graphical lasso. This work is a brief exposé of the technical developments in [1], focussing on applications in gene-gene interaction network reconstruction.

Wessel N. van Wieringen, Carel F. W. Peeters

Special session: Computational Biostatistics For Data Integration In Systems Biomedicine

Frontmatter

Estimation of a Piecewise Exponential Model by Bayesian P-splines Techniques for Prognostic Assessment and Prediction

Abstract

Methods for fitting survival regression models with a penalized smoothed hazard function have been recently discussed, even though they could be cumbersome. A simpler alternative which does not require specific software packages could be fitting a penalized piecewise exponential model. In this work the implementation of such strategy in WinBUGS is illustrated, and preliminary results are reported concerning the application of Bayesian P-splines techniques. The technique is applied to a pre-specified model in which the number and positions of knots were fixed on the basis of clinical knowledge, thus defining a non-standard smoothing problem.

Giuseppe Marano, Patrizia Boracchi, Elia M. Biganzoli

Use of q-values to Improve a Genetic Algorithm to Identify Robust Gene Signatures

Abstract

Several approaches have been proposed for the analysis of DNA microarray datasets, focusing on the performance and robustness of the final feature subsets. The novelty of this paper arises in the use of q-values to pre-filter the features of a DNA microarray dataset identifying the most significant ones and including this information into a genetic algorithm for further feature selection. This method is applied to a lung cancer microarray dataset resulting in similar performance rates and greater robustness in terms of selected features (on average a 36.21% of robustness improvement) when compared to results of the standard algorithm.

Daniel Urda, Simon Chambers, Ian Jarman, Paulo Lisboa, Leonardo Franco, Jose M. Jerez

Special session: Computational Intelligence Methods for Drug Design

Frontmatter

Drug Repurposing by Optimizing Mining of Genes Target Association

Abstract

A major alternative strategy for the pharmacology industry is to find new uses for approved drugs. A number of studies have shown that target binding of a drug often affects not only the intended disease-related genes, leading to unexpected outcomes. Thus, if the perturbed genes are related to other diseases this permits the repositioning of an existing drug. Our aim is to find hidden relations between drug targets and disease-related genes so as to find new hypotheses of new drug-disease pairs. Association Rule Mining (ARM) is a well-known data mining technique which is widely used for the discovery of interesting relations in large data sets. In this study we apply a new computational intelligence approach to 288 drugs and 267 diseases, forming 5018 known drug-disease pairs. Our method, which we call Grammatical Evolution ARM (GEARM), applies the GE optimization technique on the set of rules learned using ARM and which represent hidden relationships among gene targets. The results produced by this combination show a high accuracy of up to 95 % for the extracted rules. Likewise, the suggested approach was able to discover interesting pairs of drugs and diseases with an accuracy of 92 %. Some of these pairs have previously been reported in the literature while others can serve as new hypotheses to be explored.

Aicha Boutorh, Naruemon Pratanwanich, Ahmed Guessoum, Pietro Liò

The Importance of the Regression Model in the Structure-Based Prediction of Protein-Ligand Binding

Abstract

Docking is a key computational method for structure-based design of starting points in the drug discovery process. Recently, the use of non-parametric machine learning to circumvent modelling assumptions has been shown to result in a large improvement in the accuracy of docking. As a result, these machine-learning scoring functions are able to widely outperform classical scoring functions. The latter are characterized by their reliance on a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity.

In this paper, we demonstrate that the superior performance of machine-learning scoring functions comes from the avoidance of the functional form that all classical scoring functions assume. These scoring functions can now be directly applied to the docking poses generated by AutoDock Vina, which is expected to increase its accuracy. On the other hand, as it is well known that the assumption of additivity does not hold in some cases, it is expected that the described protocol will also improve other classical scoring functions, as it has been the case with Vina. Lastly, results suggest that incorporating ligand- and protein-only properties into a model is a promising avenue for future research.

Hongjian Li, Kwong-Sak Leung, Man-Hon Wong, Pedro J. Ballester

The Impact of Docking Pose Generation Error on the Prediction of Binding Affinity

Abstract

Docking is a computational technique that predicts the preferred conformation and binding affinity of a ligand molecule as bound to a protein pocket. It is often employed to identify a molecule that binds tightly to the target, so that a small concentration of the molecule is sufficient to modulate its biochemical function. The use of non-parametric machine learning, a data-driven approach that circumvents the need of modeling assumptions, has recently been shown to introduce a large improvement in the accuracy of docking scoring. However, the impact of pose generation error on binding affinity prediction is still to be investigated.

Here we show that the impact of pose generation is generally limited to a small decline in the accuracy of scoring. These machine-learning scoring functions retained the highest performance on PDBbind v2007 core set in this common scenario where one has to predict the binding affinity of docked poses instead of that of co-crystallized poses (e.g. drug lead optimization). Nevertheless, we observed that these functions do not perform so well at predicting the near-native pose of a ligand. This suggests that having different scoring functions for different problems is a better approach than using the same scoring function for all problems.

Hongjian Li, Kwong-Sak Leung, Man-Hon Wong, Pedro J. Ballester

Special session: Computational Biostatistics For Data Integration In Systems Biomedicine

Frontmatter

High-Performance Haplotype Assembly

Abstract

The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.

Marco Aldinucci, Andrea Bracciali, Tobias Marschall, Murray Patterson, Nadia Pisanti, Massimo Torquati

Data-Intensive Computing Infrastructure Systems for Unmodified Biological Data Analysis Pipelines

Abstract

Biological data analysis is typically implemented using a deep pipeline that combines a wide array of tools and databases. These pipelines must scale to very large datasets, and consequently require parallel and distributed computing. It is therefore important to choose a hardware platform and underlying data management and processing systems well suited for processing large datasets. There are many infrastructure systems for such data-intensive computing. However, in our experience, most biological data analysis pipelines do not leverage these systems.

We give an overview of data-intensive computing infrastructure systems, and describe how we have leveraged these for: (i) scalable fault-tolerant computing for large-scale biological data; (ii) incremental updates to reduce the resource usage required to update large-scale compendium; and (iii) interactive data analysis and exploration. We provide lessons learned and describe problems we have encountered during development and deployment. We also provide a literature survey on the use of data-intensive computing systems for biological data processing. Our results show how unmodified biological data analysis tools can benefit from infrastructure systems for data-intensive computing.

Lars Ailo Bongo, Edvard Pedersen, Martin Ernstsen

A Fine-Grained CUDA Implementation of the Multi-objective Evolutionary Approach NSGA-II: Potential Impact for Computational and Systems Biology Applications

Abstract

Many computational and systems biology challenges, in particular those related to big data analysis, can be formulated as optimization problems and therefore can be addressed using heuristics. Beside the typical optimization problems, formulated with respect to a single target, the possibility of optimizing multiple objectives (MO) is rapidly becoming more appealing. In this context, MO Evolutionary Algorithms (MOEAs) are one of the most widely used classes of methods to solve MO optimization problems. However, these methods can be particularly demanding from the computational point of view and, therefore, effective parallel implementations are needed. This fact, together with the wide diffusion of powerful and low-cost general-purpose Graphics Processing Units, promoted the development of software tools that focus on the parallelization of one or more computational phases among the steps characterizing MOEAs. In this paper we present a fine-grained parallelization of the Fast Non-dominating Sorting Genetic Algorithm (NSGA-II) for the CUDA architecture. In particular, we will discuss how this solution can be exploited to solve multi-objective optimization task in the field of computational and systems biology.

Daniele D’Agostino, Giulia Pasquale, Ivan Merelli

GPGPU Implementation of a Spiking Neuronal Circuit Performing Sparse Recoding

Abstract

Modeling and simulation techniques have been used extensively to study the complexities of brain circuits. Simulations of bio-realistic networks consisting of large number of neurons require massive computational power when they are designed to provide real-time responses in millisecond scale. A network model of cerebellar granular layer was developed and simulated here on Graphic Processing Units (GPU) which delivered a high compute capacity at low cost. We used a mathematical model namely, Adaptive Exponential leaky integrate-and-fire (AdEx) equations to model the different types of neurons in the cerebellum. The hypothesis relating spatiotemporal information processing in the input layer of the cerebellum and its relations to sparse activation of cell clusters was evaluated. The main goal of this paper was to understand the computational efficiency and scalability issues while implementing a large-scale microcircuit consisting of millions of neurons and synapses. The results suggest efficient scale-up based on pleasantly parallel modes of operations allows simulations of large-scale spiking network models for cerebellum-like network circuits.

Manjusha Nair, Bipin Nair, Shyam Diwakar

NuChart-II: A Graph-Based Approach for Analysis and Interpretation of Hi-C Data

Abstract

Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions and DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing techniques (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in order to obtain an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.

Fabio Tordini, Maurizio Drocco, Ivan Merelli, Luciano Milanesi, Pietro Liò, Marco Aldinucci

Backmatter

Titel: Computational Intelligence Methods for Bioinformatics and Biostatistics
herausgegeben von: Prof. Clelia DI Serio
Dr. Pietro Liò
Alessandro Nonis
Prof. Roberto Tagliaferri
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-24462-4
Print ISBN: 978-3-319-24461-7
DOI: https://doi.org/10.1007/978-3-319-24462-4

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Erratum to: A New Feature Selection Methodology for K-mers Representation of DNA Sequences

Regular Sessions

Frontmatter

GO-WAR: A Tool for Mining Weighted Association Rules from Gene Ontology Annotations

Extended Spearman and Kendall Coefficients for Gene Annotation List Correlation

Statistical Analysis of Protein Structural Features: Relationships and PCA Grouping

Exploring the Relatedness of Gene Sets

Consensus Clustering in Gene Expression

Automated Detection of Fluorescent Probes in Molecular Imaging

Applications of Network-based Survival Analysis Methods for Pathways Detection in Cancer

Improving Literature-Based Discovery with Advanced Text Mining

A New Feature Selection Methodology for K-mers Representation of DNA Sequences

Detecting Overlapping Protein Communities in Disease Networks

Approximate Abelian Periods to Find Motifs in Biological Sequences

Sem Best Shortest Paths for the Characterization of Differentially Expressed Genes

The General Regression Neural Network to Classify Barcode and mini-barcode DNA

Transcriptator: Computational Pipeline to Annotate Transcripts and Assembled Reads from RNA-Seq Data

Application of a New Ridge Estimator of the Inverse Covariance Matrix to the Reconstruction of Gene-Gene Interaction Networks

Special session: Computational Biostatistics For Data Integration In Systems Biomedicine

Frontmatter

Estimation of a Piecewise Exponential Model by Bayesian P-splines Techniques for Prognostic Assessment and Prediction

Use of q-values to Improve a Genetic Algorithm to Identify Robust Gene Signatures

Special session: Computational Intelligence Methods for Drug Design

Frontmatter

Drug Repurposing by Optimizing Mining of Genes Target Association

The Importance of the Regression Model in the Structure-Based Prediction of Protein-Ligand Binding

The Impact of Docking Pose Generation Error on the Prediction of Binding Affinity

Special session: Computational Biostatistics For Data Integration In Systems Biomedicine

Frontmatter

High-Performance Haplotype Assembly

Data-Intensive Computing Infrastructure Systems for Unmodified Biological Data Analysis Pipelines

A Fine-Grained CUDA Implementation of the Multi-objective Evolutionary Approach NSGA-II: Potential Impact for Computational and Systems Biology Applications

GPGPU Implementation of a Spiking Neuronal Circuit Performing Sparse Recoding

NuChart-II: A Graph-Based Approach for Analysis and Interpretation of Hi-C Data

Backmatter

Premium Partner