Abstract
Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project.
- Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nuc. Acids Res. 28, 1, 15--8.Google Scholar
- Bevington, P. R. 1969. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, Inc., New York.Google Scholar
- Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-completeness. Bell Telephone Laboratories, Inc. Google Scholar
- Green, P. 1994. Documentation for Phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap.html.Google Scholar
- Huson, D. H., Reinert, K., Kravitz, S. A., Remington, K. A., Delcher, A. L., Dew, I. M., Flanigan, M., Halpern, A. L., Lai, Z., Mobarry, C. M., Sutton, G. G., and Myers, E. W. 2001. Design of a compartmentalized shotgun assembler for the human genome. Bioinformatics (Proceedings of ISMB 2001) 17, 132--139.Google Scholar
- Huson, D. H., Reinert, K., and Myers, E. W. 2001b. The greedy path-merging algorithm for sequence assembly. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB-01), pp. 157--163. Google Scholar
- International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409, 6822, 860--921.Google Scholar
- Lander, E. S., and Waterman, M. S. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231--239.Google Scholar
- Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H. J., Remington, K. A., Anson, E. L., Bolanos, R. A., Chou, H.-H., Jordan, C. M., Halpern, A. L., Lonardi, S., Beasley, E. M., Brandon, R. C., Chen, L., Dunn, P. J., Lai, Z., Liang, Y., Nusskern, D. R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G. M., Adams, M. D., and Venter, J. C. 2000. A whole-genome assembly of Drosophila. Science 287, 2196--2204.Google Scholar
- Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., and Petersen, G. B. 1992. Nucleotide sequence of bacteriophage λ DNA. J. Mol. Bio. 162, 4, 729--773.Google Scholar
- Sanger, F., Nicklen, S., and Coulson, A. R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Nat. Acad. Sci. 74, 12, 5463--5467.Google Scholar
- U. S. Department of Energy, Office of Energy Research, and Office of Biological and Environmental Research. 1997. Human genome program report. http://www. ornl.gov/hgmis/publicat/97pr/.Google Scholar
- Venter, J. C., Adams, M. D., Myers, E. W., et al. 2001. The sequence of the human genome. Science 291, 1145--1434.Google Scholar
- Webber, J. L., and Myers, E. W. 1997. Human whole-genome shotgun sequencing. Gen. Res. 7, 5, 401--409.Google Scholar
Index Terms
- The greedy path-merging algorithm for contig scaffolding
Recommendations
The greedy path-merging algorithm for sequence assembly
RECOMB '01: Proceedings of the fifth annual international conference on Computational biologyTwo different approaches to determining the human genome are currently being pursued: one is the “clone-by-clone” approach, employed by the publicly-funded. Human Genome Project, and the other is the “whole genome shotgun” approach, favored by ...
From NGS assembly challenges to instability of fungal mitochondrial genomes
Graphical abstractMitochondrial genomes can contain repeat landscapes ranging from notable absence of repeats, as in human and fission yeast, to rich and complex repeat systems as in baker's yeast. In this article we characterize exact repetitions of 17-...
Comments