Skip to main content
Top

2019 | OriginalPaper | Chapter

DDP-B: A Distributed Dynamic Parallel Framework for Meta-genomics Binary Similarity

Authors : Mengxian Chi, Xu Jin, Feng Li, Hong An

Published in: Network and Parallel Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Great efforts have been made on meta-genomics in the field of new species exploration in the past decades. With the development of next-generation sequencing technology, meta-genomics datasets have been produced as large as dozens of hundreds of gigabytes or even several terabytes, which brings a severe challenge to data analysis. Besides, conventional meta-genomics comparing algorithms may not take full advantage of powerful computing capacity from parallel computing techniques due to lack of parallelism. In this paper, we propose DDP-B, a distributed dynamic parallel framework for meta-genomics binary similarity analysis, to overcome these limitations. In this framework, we introduce a binary distance algorithm for meta-genomics similarity measurement and develop different levels of parallel granularity of the algorithm utilizing MPI, OpenMP, and SIMD techniques. Moreover, we establish a dynamic scheduling method to deliver asynchronous parallel computing tasks and design a distributed cluster to deploy the dynamic parallel system, which completes 2.97K pairs of meta-genomics vectors comparison per second and achieves an 134.79x speedup versus the baseline in the optimal condition. Our framework shows stable scalability when assigned larger workloads.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bernard, G., Greenfield, P., Ragan, M.A., Chan, C.X.: k-mer similarity, networks of microbial genomes, and taxonomic rank. mSystems 3(6), e00257–18 (2018) Bernard, G., Greenfield, P., Ragan, M.A., Chan, C.X.: k-mer similarity, networks of microbial genomes, and taxonomic rank. mSystems 3(6), e00257–18 (2018)
2.
go back to reference Buyya, R., et al.: High Performance Cluster Computing: Architectures and Systems (Volume 1), vol. 1, p. 999. Prentice Hall, Upper Saddle River (1999) Buyya, R., et al.: High Performance Cluster Computing: Architectures and Systems (Volume 1), vol. 1, p. 999. Prentice Hall, Upper Saddle River (1999)
3.
go back to reference Chapman, B., Jost, G., Van Der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming, vol. 10. MIT Press, Cambridge (2008) Chapman, B., Jost, G., Van Der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming, vol. 10. MIT Press, Cambridge (2008)
4.
go back to reference Charras, C., Lecroq, T.: Handbook of Exact String Matching Algorithms. Citeseer (2004) Charras, C., Lecroq, T.: Handbook of Exact String Matching Algorithms. Citeseer (2004)
5.
go back to reference Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010) Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010)
6.
go back to reference Driver, H.E., Kroeber, A.L.: Quantitative Expression of Cultural Relationships, vol. 31. University of California Press, Berkeley (1932) Driver, H.E., Kroeber, A.L.: Quantitative Expression of Cultural Relationships, vol. 31. University of California Press, Berkeley (1932)
7.
go back to reference Fleischmann, R.D., et al.: Whole-genome random sequencing and assembly of haemophilus influenzae RD. Science 269(5223), 496–512 (1995)CrossRef Fleischmann, R.D., et al.: Whole-genome random sequencing and assembly of haemophilus influenzae RD. Science 269(5223), 496–512 (1995)CrossRef
8.
go back to reference Forbes, S.A.: On the local distribution of certain Illinois fishes: an essay in statistical ecology, vol. 7. Illinois State Laboratory of Natural History (1907) Forbes, S.A.: On the local distribution of certain Illinois fishes: an essay in statistical ecology, vol. 7. Illinois State Laboratory of Natural History (1907)
9.
go back to reference Grigoriev, I.V., et al.: The genome portal of the department of energy joint genome institute. Nucleic Acids Res. 40(D1), D26–D32 (2011)CrossRef Grigoriev, I.V., et al.: The genome portal of the department of energy joint genome institute. Nucleic Acids Res. 40(D1), D26–D32 (2011)CrossRef
10.
go back to reference Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)CrossRef Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)CrossRef
11.
go back to reference Hubalek, Z.: Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biol. Rev. 57(4), 669–689 (1982)CrossRef Hubalek, Z.: Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biol. Rev. 57(4), 669–689 (1982)CrossRef
12.
go back to reference Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901) Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901)
13.
14.
go back to reference Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W.: Megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)CrossRef Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W.: Megahit: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)CrossRef
15.
go back to reference Lomont, C.: Introduction to Intel advanced vector extensions. Intel White Paper, pp. 1–21 (2011) Lomont, C.: Introduction to Intel advanced vector extensions. Intel White Paper, pp. 1–21 (2011)
16.
go back to reference Metzker, M.L.: Sequencing technologies-the next generation. Nat. Rev. Genet. 11(1), 31 (2010)CrossRef Metzker, M.L.: Sequencing technologies-the next generation. Nat. Rev. Genet. 11(1), 31 (2010)CrossRef
17.
go back to reference Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)CrossRef Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)CrossRef
18.
go back to reference Ondov, B.D., et al.: Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 17(1), 132 (2016)CrossRef Ondov, B.D., et al.: Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 17(1), 132 (2016)CrossRef
19.
go back to reference Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F.: Vsearch: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016)CrossRef Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F.: Vsearch: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016)CrossRef
20.
go back to reference Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. IEEE Trans. Dependable Secur. Comput. 7(4), 337–350 (2009)CrossRef Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. IEEE Trans. Dependable Secur. Comput. 7(4), 337–350 (2009)CrossRef
21.
go back to reference Sneath, P.H.A.: The principles and practice of numerical classification. Numer. Taxon. 573, 263–268 (1973) Sneath, P.H.A.: The principles and practice of numerical classification. Numer. Taxon. 573, 263–268 (1973)
22.
go back to reference Wilming, L.G., Gilbert, J.G., Howe, K., Trevanion, S., Hubbard, T., Harrow, J.L.: The vertebrate genome annotation (vega) database. Nucleic Acids Res. 36(suppl\(\_\)1), D753–D760 (2007)CrossRef Wilming, L.G., Gilbert, J.G., Howe, K., Trevanion, S., Hubbard, T., Harrow, J.L.: The vertebrate genome annotation (vega) database. Nucleic Acids Res. 36(suppl\(\_\)1), D753–D760 (2007)CrossRef
23.
go back to reference Woese, C.R., Fox, G.E.: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. 74(11), 5088–5090 (1977)CrossRef Woese, C.R., Fox, G.E.: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. 74(11), 5088–5090 (1977)CrossRef
24.
go back to reference Woyke, T., Rubin, E.M.: Searching for new branches on the tree of life. Science 346(6210), 698–699 (2014)CrossRef Woyke, T., Rubin, E.M.: Searching for new branches on the tree of life. Science 346(6210), 698–699 (2014)CrossRef
25.
go back to reference Wrighton, K.C., et al.: Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337(6102), 1661–1665 (2012)CrossRef Wrighton, K.C., et al.: Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337(6102), 1661–1665 (2012)CrossRef
Metadata
Title
DDP-B: A Distributed Dynamic Parallel Framework for Meta-genomics Binary Similarity
Authors
Mengxian Chi
Xu Jin
Feng Li
Hong An
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-30709-7_12

Premium Partner