Cactus: Algorithms for genome multiple sequence alignment

  1. David Haussler1
  1. Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA

    Abstract

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.

    Footnotes

    • 1 Corresponding authors.

      E-mail benedict{at}soe.ucsc.edu.

      E-mail haussler{at}soe.ucsc.edu.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.123356.111.

    • Received March 15, 2011.
    • Accepted June 6, 2011.

    Freely available online through the Genome Research Open Access option.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server