skip to main content
10.1145/2616498.2616551acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Evaluating Distributed Platforms for Protein-Guided Scientific Workflow

Authors Info & Claims
Published:13 July 2014Publication History

ABSTRACT

Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

References

  1. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, "Pegasus: Planning for Execution in Grids," GriPhyN technical report 20(17):12--22.Google ScholarGoogle Scholar
  2. P. Couvares, T. Kosar, A. Roy, Jeff Weber, K. Wenger, "Workflow in Condor," Workflows for e-Science, Editors: I.Taylor, E.Deelman, D.Gannon, M.Shields, Springer Press, January 2007 (ISBN: 1-84628-519-4).Google ScholarGoogle Scholar
  3. E. Deelman, G. Singha, M. Sua, J. Blythea, Y. Gila, C. Kesselmana, G. Mehtaa, K. Vahia, G. Berrimanb, J. Goodb, A. Laityb, J. Jacobc, D. Katzc, "Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems," Scientific Programming Journal, Vol 13(3), pages 219--237, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Curcin, M. Ghanem, Y. Guo, M. Kohler, A. Rowe, J. Syed, P. Wendel, "Discovery Net: towards a grid of knowledge discovery," KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. August, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo, "VisTrails: Enabling Interactive Multiple-View Visualizations." Proceedings of IEEE Visualization, pp. 135--142, 2005.Google ScholarGoogle Scholar
  6. C. Berkley, E. Jaeger, M. Jones, B. Ludäscher. S. Mock S, "Kepler: An Extensible System for Design and Execution of Scientific Workflows," Proceedings of the The Future of Grid Data Environments, Global Grid Forum 10, 2004.Google ScholarGoogle Scholar
  7. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat. P. Li, "Taverna: A tool for the composition and enactment of bioinformatics workflows," Bioinformatics 20 (17): 3045--3054. doi:10.1093/bioinformatics/bth361. PMID 15201187, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Elhai, A. Taton, J. Massar, J. K. Myers, M. Travers, J. Casey, M. Slupesky, J. Shrager, "BioBIKE: A Web-based, programmable, integrated biological knowledge base," Nucleic Acids Research 37 (Web Server issue): W28--W32. doi:10.1093/nar/gkp354. PMC 2703918. PMID 19433511, 2009.Google ScholarGoogle Scholar
  9. R. Pordes et al. "The Open Science Grid," J. Phys. Conf. Ser. 78, 012057.doi:10.1088/1742-6596/78/1/012057, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. Extreme Science and Engineering Discovery Environment (XSEDE). {http://www.xsede.org}.Google ScholarGoogle Scholar
  11. Amazon Elastic Compute Cloud (EC2). {http://aws.amazon.com/ec2}.Google ScholarGoogle Scholar
  12. FutureGrid. {http://futuregrid.org/}.Google ScholarGoogle Scholar
  13. Nimbus Platform. {http://www.nimbusproject.org/}.Google ScholarGoogle Scholar
  14. Eucalyptus, Open Source AWS Compatible Private Clouds. {https://www.eucalyptus.com/}.Google ScholarGoogle Scholar
  15. J. --S. Vockler, G. Juve, E. Deelman, M. Rynge, G. B. Berriman, "Experiences Using Cloud Computing for A Scientific Workflow Application," Workshop on Scientific Cloud Computing (ScienceCloud), June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Pavlovikj, K. Begcy, S. Behera, M. Campbell, H. Walia, J. S. Deogun, "A Comparison of a Campus Cluster and Open Science Grid Platforms for Protein-Guided Assembly using Pegasus Workflow Management System," 28th IEEE International Parallel and Distributed Processing Symposium: Workshop on High Performance Computational Biology, May 2014.Google ScholarGoogle Scholar
  17. Z. Wang, M. Gerstein, M. Snyder, "RNA-Seq: a revolutionary tool for transcriptomics," Nature Reviews Genetics 10 (1): 57--63. doi:10.1038/nrg2484. PMC 2949280. PMID 19015660, 2009.Google ScholarGoogle Scholar
  18. D. R. Zerbino, E. Birney, "Velvet: algorithms for de novo short read assembly using de Bruijn graphs," Genome Research 18:821--829.Google ScholarGoogle ScholarCross RefCross Ref
  19. H. Xiaoqiu, M. Anup, "CAP3: A DNA Sequence Assembly Program," Genome Res. 1999 September; 9(9): 868--877.Google ScholarGoogle Scholar
  20. K. Krasileva, V. Buffalo, P. Bailey, S. Pearce, S. Ayling, F. Tabbita, M. Soria, S. Wang, IWGS Consortium, E. Akhunov, C. Uauy, J. Dubcovsky, "Separating homeologs by phasing in the tetraploid wheat transcriptome," Genome Biology 2013, 14:R66 doi:10.1186/gb-2013-14-6-r66.Google ScholarGoogle Scholar
  21. S. Altschul, W. Gish, W. Miller, E. Myers, D. Lipman, "Basic local alignment search tool," J Mol Biol 1990, 215:403--410.Google ScholarGoogle Scholar
  22. Buffalo V: Blast2cap3 software. {https://github.com/vsbuffalo/blast2cap3/}.Google ScholarGoogle Scholar
  23. G. Singh, M. --H. Su, K. Vahi, E. Deelman, B. Berriman, J. Good, D. S. Katz, G. Mehta, "Workflow Task Clustering for Best Effort Systems with Pegasus," Mardi Gras Conference, Baton Rouge, LA, January 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Python Programming Language. {http://www.python.org/}.Google ScholarGoogle Scholar
  25. Biopython. {http://biopython.org/}.Google ScholarGoogle Scholar
  26. P. Mhashilkar, A. Tiradani, B. Holzman, K. Larson, I. Sfiligoi, M. Rynge, "Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows," 20th International Conference on Computing on High Energy and Nuclear Physics (CHEP), October 2013.Google ScholarGoogle Scholar
  27. Pegasus 4.3 User Guide. {https://pegasus.isi.edu/wms/docs/latest/pegasus-user-guide.pdf/}.Google ScholarGoogle Scholar
  28. Sandhills UNL HPC Cluster. {http://hcc.unl.edu/sandhills/}.Google ScholarGoogle Scholar
  29. I. Sfiligoi, F. Würthwein, W. Andrews, J. M. Dost, I. MacNeill, A. McCrea, E. Sheripon, C. W. Murphy, "Operating a production pilot factory serving several scientific domains," J. Phys.: Conf. Ser. 331, 072031, doi:10.1088/1742-6596/331/7/072031.Google ScholarGoogle ScholarCross RefCross Ref
  30. B. Darrow, "Cycle Computing spins up 50K core Amazon cluster," GigaOm, 2012.Google ScholarGoogle Scholar
  31. Z. Ou, H. Zhuang, J. K. Nurminen, A. Ylä-Jääski, P. Hui, "Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2," HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. NCBI BioProjects. {http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA191053/}.Google ScholarGoogle Scholar
  33. S. Buyske, K. Vahi, E. Deelman, U. Peters, T. Matise, "Conducting Large-Scale Imputation Studies on the Cloud," ASHG 2013, Boston, Masachuessets, 2013.Google ScholarGoogle Scholar

Index Terms

  1. Evaluating Distributed Platforms for Protein-Guided Scientific Workflow

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
                  July 2014
                  445 pages
                  ISBN:9781450328937
                  DOI:10.1145/2616498
                  • General Chair:
                  • Scott Lathrop,
                  • Program Chair:
                  • Jay Alameda

                  Copyright © 2014 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 13 July 2014

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed limited

                  Acceptance Rates

                  XSEDE '14 Paper Acceptance Rate80of120submissions,67%Overall Acceptance Rate129of190submissions,68%
                • Article Metrics

                  • Downloads (Last 12 months)1
                  • Downloads (Last 6 weeks)0

                  Other Metrics

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader