skip to main content
article

The greedy path-merging algorithm for contig scaffolding

Published:01 September 2002Publication History
Skip Abstract Section

Abstract

Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project.

References

  1. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nuc. Acids Res. 28, 1, 15--8.Google ScholarGoogle Scholar
  2. Bevington, P. R. 1969. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, Inc., New York.Google ScholarGoogle Scholar
  3. Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-completeness. Bell Telephone Laboratories, Inc. Google ScholarGoogle Scholar
  4. Green, P. 1994. Documentation for Phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap.html.Google ScholarGoogle Scholar
  5. Huson, D. H., Reinert, K., Kravitz, S. A., Remington, K. A., Delcher, A. L., Dew, I. M., Flanigan, M., Halpern, A. L., Lai, Z., Mobarry, C. M., Sutton, G. G., and Myers, E. W. 2001. Design of a compartmentalized shotgun assembler for the human genome. Bioinformatics (Proceedings of ISMB 2001) 17, 132--139.Google ScholarGoogle Scholar
  6. Huson, D. H., Reinert, K., and Myers, E. W. 2001b. The greedy path-merging algorithm for sequence assembly. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB-01), pp. 157--163. Google ScholarGoogle Scholar
  7. International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409, 6822, 860--921.Google ScholarGoogle Scholar
  8. Lander, E. S., and Waterman, M. S. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231--239.Google ScholarGoogle Scholar
  9. Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H. J., Remington, K. A., Anson, E. L., Bolanos, R. A., Chou, H.-H., Jordan, C. M., Halpern, A. L., Lonardi, S., Beasley, E. M., Brandon, R. C., Chen, L., Dunn, P. J., Lai, Z., Liang, Y., Nusskern, D. R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G. M., Adams, M. D., and Venter, J. C. 2000. A whole-genome assembly of Drosophila. Science 287, 2196--2204.Google ScholarGoogle Scholar
  10. Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., and Petersen, G. B. 1992. Nucleotide sequence of bacteriophage λ DNA. J. Mol. Bio. 162, 4, 729--773.Google ScholarGoogle Scholar
  11. Sanger, F., Nicklen, S., and Coulson, A. R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Nat. Acad. Sci. 74, 12, 5463--5467.Google ScholarGoogle Scholar
  12. U. S. Department of Energy, Office of Energy Research, and Office of Biological and Environmental Research. 1997. Human genome program report. http://www. ornl.gov/hgmis/publicat/97pr/.Google ScholarGoogle Scholar
  13. Venter, J. C., Adams, M. D., Myers, E. W., et al. 2001. The sequence of the human genome. Science 291, 1145--1434.Google ScholarGoogle Scholar
  14. Webber, J. L., and Myers, E. W. 1997. Human whole-genome shotgun sequencing. Gen. Res. 7, 5, 401--409.Google ScholarGoogle Scholar

Index Terms

  1. The greedy path-merging algorithm for contig scaffolding

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of the ACM
            Journal of the ACM  Volume 49, Issue 5
            September 2002
            137 pages
            ISSN:0004-5411
            EISSN:1557-735X
            DOI:10.1145/585265
            Issue’s Table of Contents

            Copyright © 2002 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 September 2002
            Published in jacm Volume 49, Issue 5

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Author Tags

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader