Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi

  1. Jong Hyun Kim1,4,
  2. Michael S. Waterman2,3, and
  3. Lei M. Li2,3
  1. 1 Department of Computer Science, Yonsei University, Seoul, 120-749, Republic of Korea;
  2. 2 Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA;
  3. 3 Department of Mathematics, University of Southern California, Los Angeles, California 90089, USA

Abstract

One of the main goals in genome sequencing projects is to determine a haploid consensus sequence even when clone libraries are constructed from homologous chromosomes. However, it has been noticed that haplotypes can be inferred from genome assemblies by investigating phase conservation in sequenced reads. In this study, we seek to infer haplotypes, a diploid consensus sequence, from the genome assembly of an organism, Ciona intestinalis. The Ciona intestinalis genome is an ideal resource from which haplotypes can be inferred because of the high polymorphism rate (1.2%). The haplotype estimation scheme consists of polymorphism detection and phase estimation. The core step of our method is a Gibbs sampling procedure. The mate-pair information from two-end sequenced clone inserts is exploited to provide long-range continuity. We estimate the polymorphism rate of Ciona intestinalis to be 1.2% and 1.5%, according to two different polymorphism counting schemes. The distribution of heterozygosity number is well fit by a compound Poisson distribution. The N50 length of haplotype segments is 37.9 kb in our assembly, while the N50 scaffold length of the Ciona intestinalis assembly is 190 kb. We also infer diploid gene sequences from haplotype segments. According to our reconstruction, 85.4% of predicted gene sequences are continuously covered by single haplotype segments. Our results indicate 97% accuracy in haplotype estimation, based on a simulated data set. We conduct a comparative analysis with Ciona savignyi, and discover interesting patterns of conserved DNA elements in chordates.

Footnotes

  • 4 Corresponding author.

    4 E-mail jonghkim{at}usc.edu; fax (213) 740-2437.

  • [Supplemental material is available at www.genome.org. The diploid genome sequence of Ciona intestinalis is downloadable from http://www-rcf.usc.edu/~lilei/diploid.html. The software to reconstruct haplotypes, called hapBuild, is available on request.]

  • Article published online before print. Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.5894107

    • Received August 24, 2006.
    • Accepted April 12, 2007.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server