Skip to main content

2018 | Buch

Information- and Communication Theory in Molecular Biology

insite
SUCHEN

Über dieses Buch

This edited monograph presents the collected interdisciplinary research results of the priority program “Information- and Communication Theory in Molecular Biology (InKoMBio, SPP 1395)”, funded by the German Research Foundation DFG, 2010 until 2016. The topical spectrum is very broad and comprises, but is not limited to, aspects such as microRNA as part of cell communication, information flow in mammalian signal transduction pathway, cell-cell communication, semiotic structures in biological systems, as well as application of methods from information theory in protein interaction analysis. The target audience primarily comprises research experts in the field of biological signal processing, but the book is also beneficial for graduate students alike.

Inhaltsverzeichnis

Frontmatter

Compact Shelving of the Projects

Frontmatter
Chapter 1. Introduction
Abstract
The present book describes a broad variety of interdisciplinary results, which have been achieved within the priority program headed as “Information- and Communication Theory in Molecular Biology (InKoMBio, SPP 1395)”, that has been funded by the German Research Foundation DFG. In all projects there were at least two principle investigators from at least two fields, namely biology and information theory. The main results of almost all projects funded in InKoMBio are described in this book. By InKoMBio not only many interdisciplinary results have been created but also many ongoing interdisciplinary research activities. First a very compact shelving of the projects is given, with the objective to outline all the ventures in a condensed manner, reduced to their essential goals and obtained results, as abstracts, supported by the most important publications that have been released in the funding period. Afterwards a detailed descriptions of all specific projects can be found, which includes some administrative data, e.g., the applicants, their scientific staff and affiliation, national and international cooperations, and a summary of all publications and educational qualifications that has been supported and facilitated by project subtopics. The core element of the descriptive parts give background information about the investigated topic and the starting point of the research, as well as the actual work performed in the project. Each detailed report is finalized by concluding remarks about the essential results of the interdisciplinary work and possible future perspectives.
Martin Bossert

Detailed Descriptions

Frontmatter
Chapter 2. MicroRNA as an Integral Part of Cell Communication: Regularized Target Prediction and Network Prediction
Abstract
MicroRNAs, gene encoded small RNA molecules, play an integral part in gene regulation by binding to target mRNAs and preventing their translation. The prediction of microRNA–mRNA-binding sites and the resulting interaction network are essential to understand, and thus influence, regulation of a genetic information flow inside the living organism. Numerous algorithms have been proposed based on various heuristics; however the predictions often vary considerably. In this proposal we will extend a physical model for the binding of microRNAs to the corresponding target and establish an extended set of features influencing binding probabilities. We will be faced with the challenge of (i) too many features and (ii) few known interactions on which to train any prediction algorithm. This problem will be solved using (i) information-theoretical criteria for feature reduction, (ii) regularization, (iii) application of the Infomax approach to guarantee minimal loss of information after dimension reduction, and (iv) experimental validation of theoretical predictions using a novel test system. This strategy will allow (i) statistical analysis of the predicted microRNA–mRNA hypergraph, (ii) characterization of network motives and hierarchies, (iii) identification of missing links, and (iv) removal of false interactions.
Rolf Backofen, Fabrizio Costa, Fabian Theis, Carsten Marr, Martin Preusse, Claude Becker, Sita Saunders, Klaus Palme, Oleksandr Dovzhenko
Chapter 3. Information Flow in a Mammalian Signal Transduction Pathway
Abstract
The mammalian signal transduction network relays detailed information about the presence and concentration of ligands on the outside of the cell to the nucleus, and alters cellular behaviour by changing gene expression. Since signal transduction pathways exhibit striking similarities to typical communication systems, the framework of information theory can be directly applied to better understand cellular signalling. During the current funding period of the priority program InKoMBio, we determined the information transmission capacities of the prototypic MAPK pathway using a combination of single cell experimentation and information theoretical calculations. Surprisingly, our results indicate that the signalling network transmits less than one bit of information. Rather than faithfully reporting extracellular concentrations of the ligand EGF, it responds in a binary manner. In addition, molecular noise interferes with a robust encoding of the presence of the input signal, limiting the information content even further. We observed similarly limited channel capacities for two other signalling pathways, the TGF\(\beta \)/SMAD and p53 networks. As many studies in different biological model systems suggest that cells can gain more information than 1-bit about their environment using signalling pathways, we aim to investigate what is limiting the information transmission capabilities at the single cell level and how cells maximise the amount of information gained from external and internal sources to ensure a proper physiological response. We hypothesise that the pathways integrate information from the cellular context, which could explain the apparently low-channel capacity. We therefore propose to use information theory, single cell experimentation and mathematical modelling to study the influence of contextual information, by addressing the following specific questions: (i) how does the state of a cell influence the response to an external signal, (ii) how does the context of previous stimuli influence the response and (iii) what are common principles of context-dependent signalling across different pathways? We will use live-cell imaging and immunofluorescence assays to measure signalling and context, and calculate the contribution of contextual information using conditional mutual information, context trees and parsimonious Bayesian networks. To gain a predictive understanding of the underlying molecular mechanisms, we will expand existing mathematical models of the pathways to include the interacting regulatory processes that provide context and analyse their information theoretical properties. Using network perturbations, we will experimentally validate model predictions.
Manuela Benary, Ilias Nolis, Nils Blüthgen, Alexander Loewer
Chapter 4. Information Theoretic Concepts to Unravel Cell–Cell Communication
Abstract
Cell–Cell communication is a complex process regulating the homeostasis and cellular decisions in a multicellular organism. The correction information flow is a necessity for a healthy cellular microenvironment and proper response to external stimuli, such as inflammation and wound healing. Altered cell–cell communication is a hallmark of aging and disease. In particular, tumor–stroma interactions have attracted increased attention in recent years as putative therapeutic targets of intervention. Most studies so far have investigated individual cytokines or analyzed steady-state feedback-entangled cell–cell communication. Here, we study the onset of cell–cell communication by a defined double paracrine experimental setup of skin cells. We build in the experimental model systems developed in the first funding period and use conditioned supernatant stimulation to record whole transcriptome response time series as well as changes in the whole secretome to correlate cytokine patterns with phenotype responses. Moreover, we model the changes in gene expression and cytokine secretion through communication theoretic approaches through independent component analysis and Gaussian processes. The information from these general models is used for mechanistic, whole cell modeling using gene regulatory networks and Boolean models that comprise long-term dynamics of the cellular responses as well as multiple time scales of protein signaling, gene expression, and auto- and paracrine feedbacks. Such approaches will elucidate bi-stability of cellular homeostasis locking the cells into inflammatory or migratory states. Lastly, we will test the generic regulatory schemes by comparison of our currently investigated skin communication model with a tumor–stroma interaction system of human melanoma and fibroblast cells.
Nikola Müller, Steffen Sass, Barbara Offermann, Amit Singh, Steffen Knauer, Andreas Schüttler, Juliana Nascimento Minardi, Fabian Theis, Hauke Busch, Melanie Boerries
Chapter 5. Finding New Overlapping Genes and Their Theory (FOG Theory)
Abstract
The general goal of the project is to find and verify new overlapping protein-coding DNA sequences in prokaryotes and to understand the underlying mechanisms with the help of models from information and communication theory. To reach these goals, a cooperation of three groups is necessary, namely a group performing in vivo and in vitro molecular biology experiments, an informatic group which can handle the huge amount of widely distributed data on gene sequences, and a group working in information and communication theory. With methods from information theory, especially from error correcting codes, the process of coding proteins via embedded genes will be studied, using new distance measures. Further, the powerful concept of random coding will be used to obtain bounds. Embedded genes will be analyzed using a coding-theoretic approach. Communication theory provides models and mechanisms in order to transmit information reliably over channels which introduce errors. Evolution, as well as the process of coding proteins by overlapping genes, can be viewed as such a communication system. Both will be described and analyzed with the theory from communication systems, including synchronization mechanisms. The parameters of the models need to be verified and/or determined. Therefore, aspects of bioinformatics and molecular biology are essential. Algorithms will be developed which efficiently search databases at a large scale for new protein-coding DNA sequences in prokaryotes, embedded in annotated genes in overlapping alternative reading frames. Based on these results, experimental evaluation of embedded genes using molecular biology tools to determine function of selected candidate genes will be performed.
Siegfried Scherer, Klaus Neuhaus, Martin Bossert, Katharina Mir, Daniel Keim, Svenja Simon
Chapter 6. The Evolutive Adaptation of the Transcriptional Information Transmission in Escherichia Coli
Abstract
Evolution is the process of adaptation of organisms to their respective environment by permanent genetic alterations. Evolutive adaptation proceeds by stochastic mutations and selection of the fittest individuals. A basic problem is to understand how a population of organisms adapts to an environment. Mutations are stochastic events on the molecular level that lead to a change of the intracellular information channels from transcription factors to genes and metabolic fluxes. For this reason a communication theoretic approach is promising. The main goal of this project is the information theoretic characterisation and analysis of the intracellular information exchange during evolutive adaptation at the example of Escherichia coli populations. An information theoretic model of a cell population is a complex communication system where the inputs and outputs are stochastic variables, namely, transcription factor activities, gene expression, and metabolic fluxes. A cell population is considered to be able to model population averaged measurements. This theoretical model will be developed and iteratively adapted to experiments on the timescale of several hundred generations in a well-defined environment for E. coli. The experiments are based on a well-established platform which was built up by the ISYS and the IMB and is used for other projects, too.
Ronny Feuer, Katrin Gottlieb, Johannes Klotz, Joachim von Wulffen, Martin Bossert, Georg Sprenger, Oliver Sawodny
Chapter 7. Improving the Reliability of RNA-seq: Approaching Single-Cell Transcriptomics To Explore Individuality in Bacteria
Abstract
The main goals of this project are: (i) to improve the reliability of RNA sequencing on Illumina platforms; (ii) to develop a new, more sensitive, experimental pipeline for sequencing single bacterial cells; (iii) and, finally, to explore the individual transcriptome of isogenic cells. Currently used techniques need a large number of bacterial cells for one sequencing run. Hence, to reach single-cell resolution new library preparation approaches and amplification schemes are required, which will be developed and validated. In addition, coding theoretic methods (barcodes) need to be applied to reduce the inevitable technical variability of the sequencing process. In particular, we will develop barcodes to improve multiplexing and to reduce the amplification noise, which otherwise will hide the biological variability in the number of mRNAs in cells. This will also require to establish a comprehensive channel model of RNA-seq using statistical analysis and suitable experiments. The new established sequencing procedure will then be used to explore the stochastic cell-to-cell variability of transcriptomic profiles. We are especially interested in the phenomena of stochastic cell-state switching, which has not yet been studied on a genome-wide scale, and to explore basic mechanisms of transcription events, e.g., the mechanisms causing transcriptional bursts.
Martin Bossert, David Kracht, Siegfried Scherer, Richard Landstorfer, Klaus Neuhaus
Chapter 8. Morning and Evening Peaking Rhythmic Genes are Regulated by Distinct Transcription Factors in Neurospora crassa
Abstract
Eukaryotic genes are typically regulated by multiple transcription factors in a combinatorial manner. Quantitative understanding of gene regulation is particularly relevant for oscillatory expression due to transcriptional feedback loops. For periodic gene expression, the phases are essential for physiological functions.In our project we combine bioinformatic promoter analysis, large scale experiments (expression profiles and ChIP-Seq), and kinetic modeling to explore the information transfer from activators and repressors to gene expression phases. A comparative analysis of mammalian and fungal circadian rhythms allows to elucidate general design principles of phase regulation: Enhanced amplitudes via OR funnels and generation of harmonics via AND funnels.
Robert Lehmann, Hanspeter Herzel, Michael Brunner, Gencer Sancar, Cigdem Sancar, Bharath Ananthasubramaniam
Chapter 9. Evolution of the AMP-Activated Protein Kinase Controlled Gene Regulatory Network
Abstract
Alterations in gene regulation are considered major driving forces in divergent evolution. This is reflected in different species by the variable architecture of regulatory networks controlling highly conserved metabolic pathways. While many regulatory proteins are surprisingly conserved their wiring has evolved more rapidly. This project focuses on the adaptation to nutrient limitation, which requires the activation of the conserved AMP-activated protein kinase (AMPK alias Snf1 in yeast) and its downstream effectors. The goal is to uncover basic principles of adaptation and steps in the evolutionary process associated with regulatory network rearrangement. This requires improving the prediction of gene regulation based experimental data, DNA sequence information and information theory. In this project Context Tree (CT) models and Parsimonious Context Tree (PCT) models and the corresponding algorithms for extended Context Tree Maximization (CTM) and extended Parsimonious Context Tree Maximization (PCTM) are derived, implemented, and applied. Computational predictions and experimental validation will establish an iterative cycle to improve algorithms in each cycle leading to a growing set of experimentally verified and falsified predictions, finally allowing a deeper understanding of the evolution of the transcriptional regulatory network controlling energy metabolism, one of the most fundamental processes conserved across all kingdoms of life.
Constance Mehlgarten, Ralf Eggeling, André Gohr, Markus Bönn, Ioana Lemnian, Martin Nettling, Katharina Strödecke, Carolin Kleindienst, Ivo Grosse, Karin D. Breunig
Chapter 10. Semiotic Structures and Meaningful Information in Biological Systems
Abstract
The project aims at the semantic aspect of biological information. We will develop novel methods to objectively identify and describe semiotic subsystems of living cells. The basic idea relies on the identification of organic codes (as recently reviewed by Barbieri, Naturwissenschaften 95, 577–599, 2008) and on how these codes are physically instantiated. First, we develop formal concepts and measures that allow to describe and quantify organic codes based on experimental observations. Second, for validation, we will apply this method to already known biological codes (e.g., the genetic code) and to an in-silico artificial chemistry, in which chemical information processing can appear spontaneously and can evolve. Third, we will apply these methods to concrete biological signaling systems, in which the codes are more difficult to identify. In particular we investigate (a) microbial communication systems (chemotactic signaling in social amoeba, quorum sensing) and (b) kinetochore proteins and their involvement in the control of mitosis (especially the spindle assembly checkpoint). As benefit this project will deliver a novel way to describe and understand biological systems from a semantic perspective. We will be able to compare and classify biological information processing at the molecular level. The semantic description may also enable us to explain the evolution of bacterial languages and to design novel cellular circuits in the context of synthetic biology.
Stephan Diekmann, Peter Dittrich, Bashar Ibrahim
Chapter 11. Information Transfer in the Mammalian Circadian Clock
Abstract
Most species evolved a circadian clock to adapt to the 24 h period of the solar day. In mammals, these clocks generate endogenous rhythms by regulatory gene networks in almost every cell. A pacemaker, the suprachiasmatic nucleus (SCN) as the master clock, receives environmental input and orchestrates peripheral organs via sympathetic enervation, temperature and humoral factors. However, the mechanisms by which this synchronization is achieved are largely unknown. In order to elucidate paradigms of environmental information transfer within the circadian network, we address the following questions: How is environmental information perceived by different circadian networks? Do different circadian networks vary in their responses to a given signal, and, if so, do the differences depend on inherent circadian properties? Which part of the signal (onset, offset, duration, strength) is relevant for the responses? To address these questions, we combine experimental data from cultured single cells and organotypic slices with mathematical models of circadian oscillators and find that temperature signals have a strong impact on circadian rhythms, depending on the specific circadian properties of the clock cells.
Adrián E. Granada, Hanspeter Herzel, Achim Kramer, Ute Abraham
Chapter 12. The DNA from a Coding Perspective
Abstract
The general aim of the project was to investigate and understand the coding structure in the DNA by using information theoretic, coding, and communication tools along with molecular genetics approaches. The codon encoding structure and possible mutations are modeled as a communication channel to investigate and obtain a clearer view on the codon to amino acid mappings. In addition, principles observed in the DNA are transferred into technical source coding methods. The universal source coding algorithms proposed by Lemple and Ziv in 1977 and 1978 and by Welch in 1984 actually show some similarity to alternative splicing known from eukaryotes which further increases the variability in protein encoding. Thus, the algorithms are modified by employing bi-directional reading procedures. The other main focus is on the comprehensive description of DNA as a dual coding device carrying two (digital and analog) types of information and on the biological meaning of this capacity. In particular, the interdependence of digital and analog coding properties of the DNA is studied with regard to regulation of genetic function. To this end, the role of the spatial order of genes and the genomic gradients of DNA thermodynamic stability and superhelical density in the bacterial chromosome is investigated. The analysis is performed with regard to alterations of spatiotemporal gene expression patterns and entropy (Shannon and Gibbs) profiles. Furthermore, the role of the DNA configuration and the organization of transient chromosomal structural-functional domains (TSFDs) in coordinating genomic expression with environmental changes is explored in wild type E. coli cells and in mutants lacking the chromosome-shaping factors, as well as in the plant pathogenic bacterium D. dadantii.
Werner Henkel, Georgi Muskhelishvili, Dawit Nigatu, Patrick Sobetzko
Chapter 13. Application of Methods from Information Theory in Protein-Interaction Analysis
Abstract
The interaction of proteins with other biomolecules plays a central role in various aspects of the structural and functional organization of the cell. Their elucidation is crucial to understand processes such as metabolic control, signal transduction, and gene regulation. However, an experimental structural characterization of all of them is impractical, and only a small fraction of the potential complexes will be amenable to direct experimental analysis. Docking represents a versatile and powerful method to predict the geometry of protein–protein complexes. However, despite significant methodical advances, the identification of good docking solutions among a large number of false solutions still remains a difficult task. The present work allowed to adapt the formalism of mutual information (MI) from information theory to protein docking. In this context, we have developed a method, which finds a lower bound for the MI between a binary and an arbitrary finite random variable with joint distributions that have a variational distance not greater than a known value to a known joint distribution. This lower bound can be applied to MI estimation with confidence intervals. Different from previous results, these confidence intervals do not need any assumptions on the distribution or the sample size. An MI-based optimization protocol in conjunction with a clustering procedure was used to define reduced amino acids alphabets describing the interface properties of protein complexes. The reduced alphabets were subsequently converted into a scoring function for the evaluation of docking solutions, which is available for public use via a web service. The approach outlined above has recently been extended to the analysis of protein–DNA complexes by taking also into account geometrical parameters of the DNA.
Arno G. Stefani, Achim Sandmann, Andreas Burkovski, Johannes B. Huber, Heinrich Sticht, Christophe Jardin
Chapter 14. Identification of Causal Dependences in Gene Regulatory Networks Using Algorithmic Information Theory
Abstract
This project aims at analyzing the causal structure of genetic regulatory networks of stem cells of plants using novel causal inference techniques to be developed here. Known methods for causal inference from statistical data usually require a large number of samples. Our preliminary work shows that it is in principle possible to infer causal relations from sample size one if the variables are high-dimensional, since algorithmic information provides additional hints on causal directions. Recent advances in genomic methods have allowed the simultaneous quantification of all genes in an organism. To identify the causal relation between individual transcripts, we will use inducible expression to analyze the effect of the homeodomain transcription factor WUSCHEL on the regulatory network of plant stem cell control. After appropriate clustering of the genes, we obtain a causal network between extremely high-dimensional variables, to which algorithmic information theory based methods can be applied. The inferred causal relation will then be tested by advanced experiments.
Jan Lohmann, Dominik Janzing
Chapter 15. Molekulare Mechanismen der Datenintegration und Entscheidung zur Einleitung der Reproduktiven Phase in Pflanzen
Abstract
Within this joint project, we would like to address the question why plants show such a distinctive preference to combine a long-term winter memory and cues from photoperiod to track seasons. We will make use of the genetic model plant Arabidopsis, for which the molecular pathways involved in seasonal control have been best studied. Using mathematical modelling, we will simulate different molecular representations of flowering time control, in which either winter memory or changes in daytime length are required to precisely track season and quantify the ability of the different scenarios to correctly identify annual seasons. Additionally, we will investigate whether tracking of daytime length is an evolutionary stable strategy or can be invaded by winter memory. The experimental approach aims to implement a synthetic network in Arabidopsis that allows predicting season without vernalization requirement. This involves redirection of day length information to enter the epigenetic winter memory at the FLC Locus or bypasses the requirement for FLC by directly altering the balance of florigen and anti-florigen expression. The synthetic approach will be assisted by mathematical modelling to predict the minimum requirements needed to circumvent vernalization. Additionally, we will quantify information integration and decision-making at FT by genetic manipulations.
Markus Kollmann, Franziska Turck
Chapter 16. An Information Theoretic Approach to Stimulus Processing in the Olfactory System
Abstract
Biological communication and information systems have evolved over millions of years. Although they have been optimized under different design criteria than recent man-made technical communication systems, both are subject to the same information theoretic principles. It is the purpose of this proposal to design manageable channel models which describe information flow and signal processing by cellular and neural entities. In biology, channels are formed by transmitting intertwined chemical and electrical stimuli. A typical, however, still tractable example is the olfactory system of mammals. Mice will be used as a model to explore the basic principles of information exchange between sensory neurons and the brain by information theoretic means. Massive parallelism, optimal quantization, and information fusion will be important challenges to cope with. The final goal of this proposal is twofold. First, biologists will be provided with analytical models to simulate certain aspects of neural processes on a purely numerical basis. Second, the functionality of biological transmission channels will be explored, the basic principles will be isolated and useful features will be carried over to technical communication systems.
Martijn Arts, Rudolf Mathar, Marc Spehr
Chapter 17. RNA Structures as Processing Signals
Abstract
Gene expression is regulated on several levels, one of which is the level of RNA maturation. While we have some knowledge about RNA processing in Bacteria and Eukarya, little is known about this process in the third domain of life, Archaea. The aim of this project is to identify processing signals in the model archaeon, the halophilic archaeon Haloferax volcanii. To achieve that goal, two approaches will be taken: (i) an in silico approach and (ii) an experimental approach. Both approaches are dependent upon and complementary to each other. Processing sites identified in silico will be tested in the laboratory and results from the experimental approach will allow refinement of parameters for the in silico searches. In this interdisciplinary project the partner from computer science/information theory will help designing software tools (based on information theoretic methods) to identify putative positions in the Haloferax genome where tRNA-like structures exist. In addition, the complete set of newly identified processing sites will be searched (and clustered) for common features. The proposed project will lead to the identification of maturation signals in Haloferax volcanii, unravelling the RNA processing pathways and thus this level of regulation of gene expression in Haloferax.
Uwe Schöning, Thomas Schnattinger, Hans A. Kestler, Britta Stoll, Anita Marchfelder
Metadaten
Titel
Information- and Communication Theory in Molecular Biology
herausgegeben von
Prof. Dr. Martin Bossert
Copyright-Jahr
2018
Electronic ISBN
978-3-319-54729-9
Print ISBN
978-3-319-54728-2
DOI
https://doi.org/10.1007/978-3-319-54729-9

Neuer Inhalt