Skip to main content

2005 | Buch

Comparative Genomics

RECOMB 2004 International Workshop, RCG 2004, Bertinoro, Italy, October 16-19, 2004, Revised Selected Papers

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter
Conservation of Combinatorial Structures in Evolution Scenarios
Abstract
This paper investigates the problem of conservation of combinatorial structures in genome rearrangement scenarios. We give a characterization of a class of scenarios that conserve all common intervals, called commuting scenarios, and a characterization of permutations for which commuting scenarios exist. We show that measuring conservation of common intervals can be useful tool in assessing the quality of rearrangement scenarios, by investigating in detail three specific scenarios involving the mouse, rat and human X chromosomes.
Sèverine Bérard, Anne Bergeron, Cedric Chauve
Toward a Phylogenetically Aware Algorithm for Fast DNA Similarity Search
Abstract
High-throughput DNA sequencing is now producing collections of genomes from moderately or closely related organisms. Such a collection may be represented as a multiple alignment M of orthologous sequences, which induces a phylogenetic tree τ. Long-range genomic alignments with phylogenies have not yet found a prominent place in BLAST-like similarity search algorithms, though using them directly as databases can potentially yield more accurate and more informative alignments.
This work describes how to construct local alignments between a query and a multiple alignment in a way that explicitly uses a phylogenetic tree τ. We give an EM algorithm to find a locally optimal alignment when the location of the query on the tree τ is not known. An initial implementation of the method is tested on a large multiple alignment of sequences from eight vertebrate genomes.
Jeremy Buhler, Rachel Nordgren
Multiple Genome Alignment by Clustering Pairwise Matches
Abstract
We have developed a multiple genome alignment algorithm by using a sequence clustering algorithm to combine local pairwisegenome sequence matches produced by pairwise genome alignments, e.g, BLASTZ. Sequence clustering algorithms often generate clusters of sequences such that there exists a common shared region among all sequences in each cluster. To use a sequence clustering algorithm for genome alignment, it is necessary to handle numerous local alignments between a pair of genomes. We propose a multiple genome alignment method that converts the multiple genome alignment problem to the sequence clustering problem. This method does not need to make a guide tree to determine the order of multiple alignment, and it accurately detects multiple homologous regions. As a result, our multiple genome alignment algorithm performs competitively over existing algorithms. This is shown using an experiment which compares the performance of TBA, MultiPipMaker (MPM) and our algorithm in aligning 12 groups of 56 microbial genomes and by evaluating the number of common COGs detected.
Jeong-Hyeon Choi, Kwangmin Choi, Hwan-Gue Cho, Sun Kim
On the Structure of Reconciliations
Abstract
In this paper we present an extended model related to reconciliation concepts. It is based on gene duplications, gene losses and speciation events. We define an evolutionary scenario (called a DLS-tree) which informally can represent an evolution of genes in species. We are interested in all scenarios – not only parsimonious ones. We propose a system of rules for transforming the scenarios. We prove that the system is confluent, sound and strongly normalizing. We show that a scenario in normal form (i.e. non-reducible) is unique and minimal in the sense of the cost computed as the total number of gene duplications and losses. Moreover, we present a classification of the scenarios and analyze their hierarchy. Finally, we prove that the tree in normal form could be easily transformed into the reconciled tree [12] in duplication-loss model. This solves some open problems stated in [13].
Paweł Górecki, Jerzy Tiuryn
The Statistical Significance of Max-Gap Clusters
Abstract
Identifying gene clusters, genomic regions that share local similarities in gene organization, is a prerequisite for many different types of genomic analyses, including operon prediction, reconstruction of chromosomal rearrangements, and detection of whole-genome duplications. A number of formal definitions of gene clusters have been proposed, as well as methods for finding such clusters and/or statistical tests for determining their significance. Unfortunately, there is very little overlap between previously published rigorous analytical statistical tests and the definitions used in practice. In this paper, we consider the max-gap cluster: a contiguous region containing a maximal set of homologs, where the number of non-homologous genes between pairs of adjacent homologs is never greater than a predefined, fixed parameter, g. Although this is one of the models most widely used in practice, currently the statistical significance of max-gap clusters can only be evaluated using Monte Carlo simulations because no analytical statistical tests have been developed for it. We give exact expressions for the probability of observing such a cluster by chance, assuming a simple reference-region scenario and random gene order, as well as more efficient methods for approximating this probability. We use these methods to identify which regions of the parameter space yield clusters that are statistically significant. Finally, we discuss some of the challenges in extending this model to whole-genome comparison.
Rose Hoberman, David Sankoff, Dannie Durand
Identifying Evolutionarily Conserved Segments Among Multiple Divergent and Rearranged Genomes
Abstract
We describe a new method for reliably identifying conserved segments among genome sequences that have undergone rearrangement, horizontal transfer, and substantial nucleotide-level divergence. A Gibbs-like sampler explores different combinations of sequence-based markers shared by the genomes under study. The sampler assigns each marker a posterior probability based on how frequently it participates in some collinear group of markers. Markers with high p.p. values are likely members of conserved segments. The method identifies both large-scale and local trends in segmental collinearity, providing suitable input for genome alignment and rearrangement history inference tools. Applying our method to genomes of four Streptococci reveals that rearranged segments in these organisms belong in two size categories: large conserved segments that are interrupted by a staccato of single gene or operon-size small segments. The rearrangement pattern of large segments is best explained by symmetric inversions about the origin of replication while the pattern of small segments is not.
Bob Mau, Aaron E. Darling, Nicole T. Perna
Genome Rearrangement in Mitochondria and Its Computational Biology
Abstract
In the first part of this paper, we investigate gene orders of closely related mitochondrial genomes for studying the properties of mutations rearranging genes in mitochondria. Our conclusions are that the evolution of mitochondrial genomes is more complicated than it is considered in recent methods, and stochastic modelling is necessary for its deeper understanding and more accurate inferring. The second part is a review on the Markov chain Monte Carlo approaches for the stochastic modelling of genome rearrangement, which seem to be the only computationally tractable way to this problem. We introduce the concept of partial importance sampling, which yields a class of Markov chains being efficient both in terms of mixing and computational time. We also give a list of open algorithmic problems whose solution might help improve the efficiency of partial importance samplers.
István Miklós, Jotun Hein
The Distribution of Inversion Lengths in Bacteria
Abstract
The distribution of the lengths of genomic segments inverted during the evolutionary divergence of two species cannot be inferred directly from the output of genome rearrangement algorithms, due to the rapid loss of signal from all but the shortest inversions. The number of short inversions produced by these algorithms, however, particularly those involving a single gene, is relatively reliable. To gain some insight into the shape of the inversion-length distribution we first apply a genome rearrangement algorithm to each of 32 pairs of bacterial genomes. For each pair we then simulate their divergence using a test distribution to generate the inversions and use the simulated genomes as input to the reconstruction algorithm. It is the comparison between the algorithm output for the real pair of genomes and the simulated pair which is used to assess the test distribution. We find that simulations based on the exponential distribution cannot provide a good fit, but that simulations based on a gamma distribution can account for both single-gene inversions and short inversions involving at most 20 genes, and we conclude that the shape of latter distribution corresponds well to the true distribution at least for small inversion lengths.
David Sankoff, Jean-François Lefebvre, Elisabeth Tillier, Adrian Maler, Nadia El-Mabrouk
Estimators of Translocations and Inversions in Comparative Maps
Abstract
In a comparative map, the number of translocations in the evolutionary history of a chromosome can be estimated solely on the basis of the conserved syntenies it contains. This estimate, subtracted from the number of conserved segments, then allows the estimation of the number of inversions that have affected the chromosome. Summing these estimates over all chromosomes provides a startlingly accurate estimator (as assessed by simulation) of the total number of rearrangements of each type occurring in the evolutionary divergence of two genomes.
David Sankoff, Matthew Mazowita
Databases for Comparative Analysis of Human-Mouse Orthologous Alternative Splicing
Abstract
Comparative analyses of alternative splicing across species can provide significant biological insight not only to evolution of alternative splicing, but also to its regulation and functional significance. For comprehensive analyses of human and mouse alternatively spliced genes, we developed two databases of the human and the mouse transcriptomes, HumanSDB3 and MouSDB5 respectively. We showed that alternative splicing events in both of the transcriptomes are mainly due to the presence or absence of internal cassette exons. Our databases allow in-depth analyses of alternative and constitutive exons within alternatively spliced genes. Interactive web implementation of our databases brings to end-user the ability to instantly identify orthologous human-mouse gene pairs with their corresponding exons. This is a novel visualization method which provides easy access to conserved alternative splicing data and a tool to explore the evolution of this important biological process.
Bahar Taneri, Alexey Novoradovsky, Ben Snyder, Terry Gaasterland
Backmatter
Metadaten
Titel
Comparative Genomics
herausgegeben von
Jens Lagergren
Copyright-Jahr
2005
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32290-0
Print ISBN
978-3-540-24455-4
DOI
https://doi.org/10.1007/b105486