ABSTRACT
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system is notable for its high degree of end-to-end automation, which encompasses every stage of the data analysis pipeline from initial data access (from remote sequencing center or database, by the Globus Online file transfer system) to on-demand resource acquisition (on Amazon EC2, via the Globus Provision cloud manager); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); and efficient scheduling of these pipelines over many processors (via the Condor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets using just a web browser in a fully automated manner, without software installation.
- Goecks, J., A. Nekrutenko, and J. Taylor, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 2010. 11(8): p. R86.Google ScholarCross Ref
- Foster, I., Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing, 2011(May/June): p. 70--73. Google ScholarDigital Library
- Allen, B., et al. Software as a service for data scientists. Communications of the ACM 55.2 (2012): 81--88. Google ScholarDigital Library
- Amazon Web Services, http://aws.amazon.com/ec2.Google Scholar
- Afgan, E., et al., Harnessing cloud computing with Galaxy Cloud. Nat Biotech, 2011. 29(11): p. 972--974.Google ScholarCross Ref
- Thain, D., T. Tannenbaum, and M. Livny, Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 2005. 17(2-4): p. 323--356. Google ScholarDigital Library
- Bamshad et al., Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics 12, 745--755 (November 2011) | doi:10.1038/nrg3031.Google ScholarCross Ref
- Matthew Meyerson, Stacey Gabriel and Gad Getz: Advances in understanding cancer genomes through second-generation sequencing: doi:10.1038/nrg2841.Google Scholar
- Lincoln D. Stein: Genome Biology 2010 11:207 doi:10.1186/gb-2010-11-5-207.Google Scholar
- Elizabeth Pennisi: Will Computers Crash Genomics? Science 11 February 2011: Vol. 331 no. 6018 pp. 666--668 DOI: 10.1126/science.331.6018.666.Google Scholar
- Lincoln D Stein: The case for cloud computing in genome informatics. Genome Biology 2010, 11:207 doi:10.1186/gb-2010-11-5-207.Google Scholar
Index Terms
- Experiences in building a next-generation sequencing analysis service using galaxy, globus online and Amazon web service
Recommendations
Experiences building Globus Genomics: a next-generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis ...
Alignment-Free sequence comparison based on next generation sequencing reads: extended abstract
RECOMB'12: Proceedings of the 16th Annual international conference on Research in Computational Molecular BiologyNext generation sequencing (NGS) technologies have generated enormous amount of shotgun read data and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun ...
Enabling large-scale next-generation sequence assembly with Blacklight
A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic ...
Comments