Skip to main content
Top

2015 | OriginalPaper | Chapter

Exploiting the Parallel Execution of Homology Workflow Alternatives in HPC Compute Clouds

Authors : Kary A. C. S. Ocaña, Daniel de Oliveira, Vítor Silva, Silvia Benza, Marta Mattoso

Published in: Service-Oriented Computing - ICSOC 2014 Workshops

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Homology modeling (HM) plays an important role in drug discovery. HM analysis aims at predicting a 3D model from a biological sequence in order to discover new drugs. There are several problems in executing an HM analysis in large-scale, such as multiple software to be evaluated, the management of the parallel execution, and results analysis, e.g. browsing manually all results to find which structure was derived from which program with good quality. Scientific Workflow Management System (SWfMS) with parallelism and provenance support can aid the large-scale HM executions by addressing the result analysis. However, before submitting the HM workflow for execution, it has to be specified along with its several alternatives (also called variants), as considered in this paper. Managing HM workflow variations is a complex task to be accomplished even with the help of a SWfMS. In this paper, we propose SciSamma (Structural Approach and Molecular Modeling Analyses), an abstract representation of HM workflows inspired in the concept of software product lines (SPL). SciSamma models HM workflow variants to execute with parallel processing in the cloud using SciCumulus SWfMS. We evaluated SciSamma with two common variants using 100 protease enzymes of protozoan genomes. Both variations presented scalability with performance improvements (dropping from 8 h to 27 min using 32 Amazon’s large virtual machines). While evaluating the two workflow variants, through provenance queries, they present the same quality in biological results, but the difference in execution time between them was around 40 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cavasotto, C.N., Phatak, S.S.: Homology modeling in drug discovery: current trends and applications. Drug Discov. Today. 14, 676–683 (2009)CrossRef Cavasotto, C.N., Phatak, S.S.: Homology modeling in drug discovery: current trends and applications. Drug Discov. Today. 14, 676–683 (2009)CrossRef
2.
go back to reference Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. Computer 40, 24–32 (2007)CrossRef Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. Computer 40, 24–32 (2007)CrossRef
3.
go back to reference Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)CrossRef Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)CrossRef
4.
go back to reference Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: creating large-scale scientific applications using semantic representations of computational workflows. In: The National Conference on Artificial Intelligence, pp. 1767–1774, Vancouver, BC, Canada (2007) Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: creating large-scale scientific applications using semantic representations of computational workflows. In: The National Conference on Artificial Intelligence, pp. 1767–1774, Vancouver, BC, Canada (2007)
5.
go back to reference Deelman, E., Mehta, G., Singh, G., Su, M.-H., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 376–394. Springer, London (2007)CrossRef Deelman, E., Mehta, G., Singh, G., Su, M.-H., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 376–394. Springer, London (2007)CrossRef
6.
go back to reference Santos, I., Dias, J., Oliveira, D., Ogasawara, E., Ocaña, K., Mattoso, M.: Runtime dynamic structural changes of scientific workflows in clouds. In: Proceedings of the IEEE/ACM 6th International Workshop on Clouds and (eScience) Applications Management – CloudAM, pp. 417–422. Dresden, Germany (2013) Santos, I., Dias, J., Oliveira, D., Ogasawara, E., Ocaña, K., Mattoso, M.: Runtime dynamic structural changes of scientific workflows in clouds. In: Proceedings of the IEEE/ACM 6th International Workshop on Clouds and (eScience) Applications Management – CloudAM, pp. 417–422. Dresden, Germany (2013)
7.
go back to reference Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: Proceedings of the 3rd International Conference on Cloud Computing, pp. 378–385. IEEE, Washington, DC, USA (2010) Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: Proceedings of the 3rd International Conference on Cloud Computing, pp. 378–385. IEEE, Washington, DC, USA (2010)
8.
go back to reference Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: Proceedings of the Joint EDBT/ICDT 2013 - Workshops on EDBT 2013, pp. 282–289. ACM Press, NY, USA (2013) Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: Proceedings of the Joint EDBT/ICDT 2013 - Workshops on EDBT 2013, pp. 282–289. ACM Press, NY, USA (2013)
9.
go back to reference Moreau, L., Groth, P.: Provenance: an introduction to PROV. In: Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 3(4), pp. 1-129. Morgan & Claypool Publishers, San Rafael (2013) Moreau, L., Groth, P.: Provenance: an introduction to PROV. In: Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 3(4), pp. 1-129. Morgan & Claypool Publishers, San Rafael (2013)
10.
go back to reference Shah, F., Mukherjee, P., Desai, P., Avery, M.: Computational approaches for the discovery of cysteine protease inhibitors against Malaria and SARS. Curr. Comput. Aided-Drug Des. 6, 1–23 (2010)CrossRef Shah, F., Mukherjee, P., Desai, P., Avery, M.: Computational approaches for the discovery of cysteine protease inhibitors against Malaria and SARS. Curr. Comput. Aided-Drug Des. 6, 1–23 (2010)CrossRef
11.
go back to reference Lindoso, J.A.L., Lindoso, A.A.B.P.: Neglected tropical diseases in Brazil. Revista do Instituto de Medicina Tropical de São Paulo. 51, 247–253 (2009)CrossRef Lindoso, J.A.L., Lindoso, A.A.B.P.: Neglected tropical diseases in Brazil. Revista do Instituto de Medicina Tropical de São Paulo. 51, 247–253 (2009)CrossRef
12.
go back to reference Oliveira, D., Ocaña, K., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10, 521–552 (2012)CrossRef Oliveira, D., Ocaña, K., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10, 521–552 (2012)CrossRef
13.
go back to reference Martí-Renom, M.A., Stuart, A.C., Fiser, A., Sánchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)CrossRef Martí-Renom, M.A., Stuart, A.C., Fiser, A., Sánchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)CrossRef
14.
go back to reference Rose, P.W., Bi, C., Bluhm, W.F., Christie, C.H., Dimitropoulos, D., Dutta, S., Green, R.K., Goodsell, D.S., Prlic, A., Quesada, M., Quinn, G.B., Ramos, A.G., Westbrook, J.D., Young, J., Zardecki, C., Berman, H.M., Bourne, P.E.: The RCSB protein data bank: new resources for research and education. Nucleic Acids Res. 41, D475–D482 (2013)CrossRef Rose, P.W., Bi, C., Bluhm, W.F., Christie, C.H., Dimitropoulos, D., Dutta, S., Green, R.K., Goodsell, D.S., Prlic, A., Quesada, M., Quinn, G.B., Ramos, A.G., Westbrook, J.D., Young, J., Zardecki, C., Berman, H.M., Bourne, P.E.: The RCSB protein data bank: new resources for research and education. Nucleic Acids Res. 41, D475–D482 (2013)CrossRef
15.
go back to reference Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRef Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRef
16.
go back to reference Eswar, N., Eramian, D., Webb, B., Shen, M.-Y., Sali, A.: Protein structure modeling with MODELLER. Methods Mol. Biol. 426, 145–159 (2008)CrossRef Eswar, N., Eramian, D., Webb, B., Shen, M.-Y., Sali, A.: Protein structure modeling with MODELLER. Methods Mol. Biol. 426, 145–159 (2008)CrossRef
17.
go back to reference Sutcliffe, M.J., Haneef, I., Carney, D., Blundell, T.L.: Knowledge based modelling of homologous proteins, part I: three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1, 377–384 (1987)CrossRef Sutcliffe, M.J., Haneef, I., Carney, D., Blundell, T.L.: Knowledge based modelling of homologous proteins, part I: three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1, 377–384 (1987)CrossRef
18.
go back to reference Li, H., Tejero, R., Monleon, D., Bassolino-Klimas, D., Abate-Shen, C., Bruccoleri, R.E., Montelione, G.T.: Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Sci. 6, 956–970 (1997)CrossRef Li, H., Tejero, R., Monleon, D., Bassolino-Klimas, D., Abate-Shen, C., Bruccoleri, R.E., Montelione, G.T.: Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Sci. 6, 956–970 (1997)CrossRef
19.
go back to reference Xiang, J.Z., Honig, B.: Jackal: a Protein Structure Modeling Package. Columbia University and Howard Hughes Medical Institute, New York (2002) Xiang, J.Z., Honig, B.: Jackal: a Protein Structure Modeling Package. Columbia University and Howard Hughes Medical Institute, New York (2002)
20.
go back to reference Koehl, P., Delarue, M.: A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling. Nat. Struct. Biol. 2, 163–170 (1995)CrossRef Koehl, P., Delarue, M.: A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling. Nat. Struct. Biol. 2, 163–170 (1995)CrossRef
21.
go back to reference Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M.: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 (1993)CrossRef Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M.: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 (1993)CrossRef
22.
go back to reference Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R.: NCBI reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009)CrossRef Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R.: NCBI reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009)CrossRef
23.
go back to reference Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006)CrossRef Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006)CrossRef
Metadata
Title
Exploiting the Parallel Execution of Homology Workflow Alternatives in HPC Compute Clouds
Authors
Kary A. C. S. Ocaña
Daniel de Oliveira
Vítor Silva
Silvia Benza
Marta Mattoso
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-22885-3_29

Premium Partner