Skip to main content

2018 | Buch

Algorithms for Computational Biology

5th International Conference, AlCoB 2018, Hong Kong, China, June 25–26, 2018, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the 5th InternationalConference on Algorithms for Computational Biology, AlCoB 2018, held in Hong Kong, China, in June 2018.
The 11 full papers presented together with 1 invited paper were carefully reviewed and selected from 20 submissions. They are organized in the following topical sections: Phylogenetics, Sequence Rearrangement and Analysis, Systems Biology and Other Biological Processes.

Inhaltsverzeichnis

Frontmatter

Invited Talk

Frontmatter
Algorithms for Analysis and Control of Boolean Networks
Abstract
Boolean network is a discrete mathematical model of gene regulatory networks. In this short article, we briefly review algorithmic results on finding attractors in Boolean networks. Since it is known that the problem of finding a singleton attractor is NP-hard and the problem can be trivially solved in \(O^{*}(2^n)\) time (under a reasonable assumption), we focus on special cases in which the problem can be solved in \(O((2-\delta )^n)\) time for some constant \(\delta >0\). We also briefly review algorithmic results on control of Boolean networks.
Tatsuya Akutsu

Phylogenetics

Frontmatter
Reconciliation Feasibility of Non-binary Gene Trees Under a Duplication-Loss-Coalescence Model
Abstract
Phylogenetic tree reconciliation is a widely-used method to understand gene family evolution. For eukaryotes, the duplication-loss-coalescence (DLC) model seeks to explain incongruence between gene trees and species trees by postulating gene duplication, gene loss, and deep coalescence events. While efficient algorithms exist for inferring optimal DLC reconciliations, they assume that only one individual is sampled per species. In recent work, we demonstrated that with additional samples, there exist gene tree topologies that are impossible to reconcile with any species tree. However, our algorithm required the gene tree to be binary whereas, in practice, gene trees are often non-binary due to uncertainty in the reconstruction process. In this work, we consider for the first time reconciliation under the DLC model with non-binary gene trees. Specifically, we describe an efficient algorithm that takes as input an arbitrary gene tree with an arbitrary number of samples per species and either (1) determines that there is a valid reconcilable binary resolution of that tree and constructs one such resolution or (2) determines that there exists no valid reconcilable binary resolution of that tree. Our work makes it possible to systematically analyze non-binary gene trees and will help biologists identify incorrect gene tree topologies and thus avoid incorrect evolutionary inferences.
Ricson Cheng, Matthew Dohlen, Chen Pekker, Gabriel Quiroz, Jincheng Wang, Ran Libeskind-Hadas, Yi-Chieh Wu
Solving the Tree Containment Problem for Reticulation-Visible Networks in Linear Time
Abstract
The tree containment problem (TCP) is a fundamental problem in phylogenetic study. It was introduced as a mean for verifying whether a network is consistent with a binary tree. The containment problem is NP-complete, even if the network input is binary. If the input is restricted to reticulation-visible networks, the TCP has been proved to be solvable in quadratic time. In this paper, we show that there is a linear time TCP algorithm for binary reticulation-visible networks.
Andreas D. M. Gunawan
Polynomial-Time Algorithms for Phylogenetic Inference Problems
Abstract
A common problem in phylogenetics is to try to infer a species phylogeny from gene trees. We consider different variants of this problem. The first variant, called Unrestricted Minimal Episodes Inference, aims at inferring a species tree based on a model of speciation and duplication where duplications are clustered in duplication episodes. The goal is to minimize the number of such episodes. The second variant, Parental Hybridization, aims at inferring a species network based on a model of speciation and reticulation. The goal is to minimize the number of reticulation events. It is a variant of the well-studied Hybridization Number problem with a more generous view on which gene trees are consistent with a given species network. We show that these seemingly different problems are in fact closely related and can, surprisingly, both be solved in polynomial time, using a structure we call “beaded trees”. However, we also show that methods based on these problems have to be used with care because the optimal species phylogenies always have some restricted form. We discuss several possibilities to overcome this problem.
Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, Norbert Zeh

Sequence Rearrangement and Analysis

Frontmatter
Approximation Algorithms for Sorting Permutations by Fragmentation-Weighted Operations
Abstract
Rearrangements are mutations that affect large portions of a genome. When comparing two genomes, one wants to find a sequence of rearrangements that transforms one into another. When we use permutations to represent the genomes, this reduces to the problem of sorting a permutation using some sequence of rearrangements. The traditional approach is to find a sequence of minimum length. However, some studies show that some rearrangements are more likely to disturb an individual, and so a weighted approach is closer to the real evolutionary process. Most weighted approaches consider that the cost of a rearrangement can be related to its type or to the number of elements affected by it. This work introduces a new type of cost function, which is related to the amount of fragmentation caused by a rearrangement. We present some results about lower and upper bounds for the fragmentation-weighted problems and the relation between the unweighted and the fragmentation-weighted approach. Our main results are 2-approximation algorithms for 5 versions of the fragmentation-weighted problem involving reversals and transpositions events.
Alexsandro Oliveira Alexandrino, Carla Negri Lintzmayer, Zanoni Dias
Heuristics for the Sorting Signed Permutations by Reversals and Transpositions Problem
Abstract
We present two heuristics, Sliding Window and Look Ahead, to improve solutions for the Sorting Signed Permutations by Reversals and Transpositions Problem. To assess the heuristics, we implemented algorithms described in the literature to provide initial solutions. Despite the fact that we addressed a specific problem, both heuristics can be applied to many others within the area of genome rearrangement. When time is a crucial factor, Sliding Window is a better choice because it runs in linear time and improves the initial solutions in 76.4% of cases. If the quality of the solution is a priority, Look Ahead should be preferred because it improves the initial solutions in 97.6% of cases, but it runs in \(\mathcal {O}(n^3 \times alg(n))\), where alg(n) is the complexity of the algorithm given as input. By using these heuristics one may find a good tradeoff between running time and solution improvement.
Klairton Lima Brito, Andre Rodrigues Oliveira, Ulisses Dias, Zanoni Dias
Sorting Permutations by Limited-Size Operations
Abstract
Estimating the evolutionary distance between genomes of two organisms is a challenging task for Computational Biology. One of the most well-accepted ways to do this is to consider the size of the smallest sequence of rearrangement events required to transform one genome into another, characterizing the rearrangement distance problem. Computationally, genomes can be represented as permutations of integers and, with this, the problem can be reduced to transforming a permutation into the identity with the minimum number of operations (sorting the permutation). These operations are given by a rearrangement model and they affect segments of a genome in different ways. Among the most common models are those that allow only reversals, only transpositions, or both of them. In this paper we study sorting permutations when a restriction of biological relevance is added: the size of the rearrangements should be at most a given value \(\lambda \). Some results are known for \(\lambda = 2\) and \(\lambda = 3\) but, to the best of our knowledge, there are no results for \(\lambda > 3\). We consider rearrangement models that allow reversals and/or transpositions for sorting unsigned permutations given any value of \(\lambda \). We present approximation algorithms for 3 such problems, where the approximation factors depend on \(\lambda \) and/or on the size of the permutations.
Guilherme Henrique Santos Miranda, Carla Negri Lintzmayer, Zanoni Dias
Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density
Abstract
A novel algorithm to find all sufficiently long repeating nucleotide substrings in one or several DNA sequences is proposed. The algorithm searches approximately matching strings very fast with given level of local error density. Some biological applications illustrate the method.
Sergey P. Tsarev, Maria Y. Senashova, Michael G. Sadovsky

Systems Biology and Other Biological Processes

Frontmatter
Fixed Parameter Algorithms and Hardness of Approximation Results for the Structural Target Controllability Problem
Abstract
Recent research has revealed new applications of network control science within bio-medicine, pharmacology, and medical therapeutics. These new insights and new applications generated in turn a rediscovery of some old, unresolved algorithmic questions, this time with a much stronger motivation for their tackling. One of these questions regards the so-called Structural Target Control optimization problem, known in previous literature also as Structural Output Controllability problem. Given a directed network (graph) and a target subset of nodes, the task is to select a small (or the smallest) set of nodes from which the target can be independently controlled, i.e., it can be driven from any given initial configuration to any desired final one, through a finite sequence of input values. In recent work, this problem has been shown to be NP-hard, and several heuristic algorithms were introduced and analyzed, both on randomly generated networks, and on bio-medical ones. In this paper, we show that the Structural Target Controllability problem is fixed parameter tractable when parameterized by the number of target nodes. We also prove that the problem is hard to approximate at a factor better than \(O(\log {n})\).
Eugen Czeizler, Alexandru Popa, Victor Popescu
Symbolic Detection of Steady States of Autonomous Differential Biological Systems by Transformation into Block Triangular Form
Abstract
In this paper we propose a method for transforming a square polynomial set into block triangular form by using Tarjan’s algorithm. The proposed method is then applied to symbolic detection of steady states of autonomous differential biological systems which are usually sparse systems with a large number of loosely coupling variables. Two biological systems of 12 and 43 variables respectively are studied to illustrate the effectiveness of the proposed method.
Chenqi Mou
Consensus Decoding of Recurrent Neural Network Basecallers
Abstract
There is an extensive literature using probabilistic models, such as hidden Markov models, for the analysis of biological sequences. These models have a clear theoretical basis, and many heuristics have been developed to reduce the time and memory requirements of the dynamic programming algorithms used for their inference. Nevertheless, mirroring the shift in natural language processing, bioinformatics is increasingly seeing higher accuracy predictions made by recurrent neural networks (RNN). This shift is exemplified by basecalling on the Oxford Nanopore Technologies’ sequencing platform, in which a continuous time series of current measurements is mapped to a string of nucleotides. Current basecallers have applied connectionist temporal classification (CTC), a method originally developed for speech recognition, and focused on the task of decoding RNN output from a single read. We wish to extend this method for the more general task of consensus basecalling from multiple reads, and in doing so, exploit the gains in both accelerated algorithms for sequence analysis and recurrent neural networks, areas that have advanced in parallel over the past decade. To this end, we develop a dynamic programming algorithm for consensus decoding from a pair of RNNs, and show that it can be readily optimized with the use of an alignment envelope. We express this decoding in the notation of finite state automata, and show that pair RNN decoding can be compactly represented using automata operations. We additionally introduce a set of Markov chain Monte Carlo moves for consensus basecalling multiple reads.
Jordi Silvestre-Ryan, Ian Holmes
Utilize Imputation Method and Meta-analysis to Identify DNA-Methylation-Mediated microRNAs in Ovarian Cancer
Abstract
Both epigenetics and genetic alterations are associated with cancer formation. Identification of prognostic biomarkers, DNA-methylation-mediated miRNAs, is an important step towards developing therapeutic treatment of cancer. Ovarian cancer is one of the most lethal cancers among the females, it was selected for the present study. The TCGA database provides large volume of data for cancer study, which is useful if one can combine different batches of datasets; hence, higher confident results can be obtained. There are several issues arise in data integration, i.e., missing data problem, data heterogeneity problem and the need of construct an automatic platform to reduce human intervention.
Method. Both the normal and ovarian tumor datasets were obtained from the TCGA database. To interpolate the missing methylation values, we employed the KNN imputation method. Simulation tests were performed to obtain the optimal k value. We utilized meta-analysis to minimize the heterogeneity problem and derived statistical significant DNA methylation-mediated-miRNA events. Finally, a semi-automatic pipeline was constructed to facilitate the imputation and meta-analysis studies; thus, identify potential epigenetic biomarkers in a more efficient manner.
Results. Both epigenetic- and TF-mediated effects were examined, which allow us to remove false positive events. The methylation-mediated-miRNA pairs identified by our platform are in-line with literature studies.
Conclusion. We have demonstrated that our imputation and meta-analysis pipeline led to better performance and efficiency in detecting methylation-mediated-miRNA pairs. Furthermore, this study reveals the association between aberrant DNA methylation and alternated miRNA expression, which contributes to better knowledge of the role of epigenetics regulation in ovarian cancer formation.
Ezra B. Wijaya, Erwandy Lim, David Agustriawan, Chien-Hung Huang, Jeffrey J. P. Tsai, Ka-Lok Ng
Backmatter
Metadaten
Titel
Algorithms for Computational Biology
herausgegeben von
Jesper Jansson
Carlos Martín-Vide
Miguel A. Vega-Rodríguez
Copyright-Jahr
2018
Electronic ISBN
978-3-319-91938-6
Print ISBN
978-3-319-91937-9
DOI
https://doi.org/10.1007/978-3-319-91938-6

Premium Partner