Skip to main content

2017 | OriginalPaper | Buchkapitel

Scalability of a Genomic Data Analysis in the BioTest Platform

verfasst von : Krzysztof Psiuk-Maksymowicz, Dariusz Mrozek, Roman Jaksik, Damian Borys, Krzysztof Fujarewicz, Andrzej Swierniak

Erschienen in: Intelligent Information and Database Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

BioTest platform is dedicated for the processing of biomedical data that originate from various measurement techniques. This includes next-generation sequencing (NGS), that focuses the attention of researchers all of the world due to its broad possibilities in determining the structure of the DNA and RNA. However, the analysis of data provided by NGS requires large disk space, and is time-consuming, becoming a challenge for the data processing systems. In this paper, we have analyzed the possibility of scaling the BioTest platform in terms of genomic data analysis and platform architecture. Scalability tests were carried out using next-generation sequencing data and relied on methods for detection of somatic mutations and polymorphisms in the human DNA. Our results show that the platform is scalable, allowing to significantly reduce the execution time of performed calculations. However, the scalability capabilities depend on the experiment methodology and homogeneity of resources required by each task, which in NGS studies can be highly variable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bensz, W., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems. SCI, vol. 642, pp. 399–409. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31277-4_35 CrossRef Bensz, W., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems. SCI, vol. 642, pp. 399–409. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-31277-4_​35 CrossRef
2.
Zurück zum Zitat Cibulskis, C., Lawrence, M.S., Carter, S.L., Sivachenko, A., Jaffe, D., Sougnez, C., Gabriel, S., Meyerson, M., Lander, E.S., Getz, G.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013)CrossRef Cibulskis, C., Lawrence, M.S., Carter, S.L., Sivachenko, A., Jaffe, D., Sougnez, C., Gabriel, S., Meyerson, M., Lander, E.S., Getz, G.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013)CrossRef
3.
Zurück zum Zitat Decap, D., Reumers, J., Herzeel, C., Costanza, P., Fostier, J.: Halvade: scalable sequence analysis with MapReduce. Bioinformatics 31(15), 2482–2488 (2015)CrossRef Decap, D., Reumers, J., Herzeel, C., Costanza, P., Fostier, J.: Halvade: scalable sequence analysis with MapReduce. Bioinformatics 31(15), 2482–2488 (2015)CrossRef
4.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRef
5.
Zurück zum Zitat DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)CrossRef DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)CrossRef
6.
Zurück zum Zitat Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 439681, 1–8 (2013) Hung, C.L., Lin, Y.L.: Implementation of a parallel protein structure alignment service on cloud. Int. J. Genomics 439681, 1–8 (2013)
7.
Zurück zum Zitat Koboldt, D.C., Zhang, Q., Larson, D.E., Shen, D., McLellan, M.D., Lin, L., Miller, C.A., Mardis, E.R., Ding, L., Wilson, R.K.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012)CrossRef Koboldt, D.C., Zhang, Q., Larson, D.E., Shen, D., McLellan, M.D., Lin, L., Miller, C.A., Mardis, E.R., Ding, L., Wilson, R.K.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012)CrossRef
8.
Zurück zum Zitat Larson, D.E., Harris, C.C., Chen, K., Koboldt, D.C., Abbott, T.E., Dooling, D.J., Ley, T.J., Mardis, E.R., Wilson, R.K., Ding, L.: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2011)CrossRef Larson, D.E., Harris, C.C., Chen, K., Koboldt, D.C., Abbott, T.E., Dooling, D.J., Ley, T.J., Mardis, E.R., Wilson, R.K., Ding, L.: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2011)CrossRef
10.
Zurück zum Zitat Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(2), 209–219 (2016)CrossRef Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(2), 209–219 (2016)CrossRef
11.
Zurück zum Zitat McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)CrossRef McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)CrossRef
12.
Zurück zum Zitat McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thormann, A., Flicek, P., Cunningham, F.: The ensembl variant effect predictor. Genome Biol. 17(1), 122 (2016)CrossRef McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thormann, A., Flicek, P., Cunningham, F.: The ensembl variant effect predictor. Genome Biol. 17(1), 122 (2016)CrossRef
13.
Zurück zum Zitat Meienberg, J., Bruggman, R., Oexle, K., Matyas, G.: Clinical sequencing: is WGS the better WES? Hum. Genet. 135, 359–362 (2016)CrossRef Meienberg, J., Bruggman, R., Oexle, K., Matyas, G.: Clinical sequencing: is WGS the better WES? Hum. Genet. 135, 359–362 (2016)CrossRef
14.
Zurück zum Zitat Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)CrossRef Metzker, M.L.: Sequencing technologies - the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)CrossRef
15.
Zurück zum Zitat Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)CrossRef Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)CrossRef
16.
Zurück zum Zitat Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13, 561–585 (2015)CrossRef Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13, 561–585 (2015)CrossRef
17.
Zurück zum Zitat Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inf. Sci. 349–350, 77–101 (2016)CrossRef Mrozek, D., Daniłowicz, P., Małysiak-Mrozek, B.: HDInsight4PSi: boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud. Inf. Sci. 349–350, 77–101 (2016)CrossRef
18.
Zurück zum Zitat Psiuk-Maksymowicz, K., Placzek, A., Jaksik, R., Student, S., Borys, D., Mrozek, D., Fujarewicz, K., Swierniak, A.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. Commun. Comput. Inf. Sci. 616, 449–462 (2016) Psiuk-Maksymowicz, K., Placzek, A., Jaksik, R., Student, S., Borys, D., Mrozek, D., Fujarewicz, K., Swierniak, A.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. Commun. Comput. Inf. Sci. 616, 449–462 (2016)
19.
Zurück zum Zitat Saunders, C.T., Wong, W.S., Swamy, S., Becq, J., Murray, L.J., Cheetham, R.K.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012)CrossRef Saunders, C.T., Wong, W.S., Swamy, S., Becq, J., Murray, L.J., Cheetham, R.K.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012)CrossRef
20.
Zurück zum Zitat Wiewiorka, M.S., Messina, A., Pacholewska, A., Maffioletti, S., Gawrysiak, P., Okoniewski, M.J.: SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics 30(18), 2652–2653 (2014)CrossRef Wiewiorka, M.S., Messina, A., Pacholewska, A., Maffioletti, S., Gawrysiak, P., Okoniewski, M.J.: SparkSeq: fast, scalable, cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics 30(18), 2652–2653 (2014)CrossRef
21.
Zurück zum Zitat Xu, H., DiCarlo, J., Satya, R.V., Peng, Q., Wang, Y.: Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genom. 15, 244 (2014)CrossRef Xu, H., DiCarlo, J., Satya, R.V., Peng, Q., Wang, Y.: Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genom. 15, 244 (2014)CrossRef
Metadaten
Titel
Scalability of a Genomic Data Analysis in the BioTest Platform
verfasst von
Krzysztof Psiuk-Maksymowicz
Dariusz Mrozek
Roman Jaksik
Damian Borys
Krzysztof Fujarewicz
Andrzej Swierniak
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-54430-4_71