Skip to main content

2017 | OriginalPaper | Buchkapitel

Porting Referential Genome Compression Tool on Loongson Platform

verfasst von : Zheng Du, Chao Guo, Yijun Zhang, Qiuming Luo

Erschienen in: Parallel Architecture, Algorithm and Programming

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the fast development of genome sequencing technology, genome sequencing become faster and affordable. Consequently, genomic scientists are now facing an explosive increase of genomic data. Managing, storing and analyzing this quickly growing amount of data is challenging. It is desirable to apply some compression techniques to reduce storage and transferring cost. Referential genome compression is one of these techniques, which exploited the highly similarity of the same or an evolutionary close species (e.g., two randomly selected humans have at least 99% of genetic similarity) and store only the differences between the compressed file and well-known reference genome sequence. In this paper, we port two referential compression algorithm to Loongson platform and profiling their performance. And we use multi-process technology to improve the speed of compression.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., et al.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)CrossRef Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., et al.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)CrossRef
3.
Zurück zum Zitat Reuter, J.A., Spacek, D.V., Snyder, M.P.: High-throughput sequencing technologies. Mol. Cell 58, 586–597 (2015)CrossRef Reuter, J.A., Spacek, D.V., Snyder, M.P.: High-throughput sequencing technologies. Mol. Cell 58, 586–597 (2015)CrossRef
4.
Zurück zum Zitat Joly, Y., Dove, E.S., Knoppers, B.M., Bobrow, M., Chalmers, D.: Data sharing in the post-genomic world: the experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO). PLoS Comput. Biol. 8, e1002549 (2012)CrossRef Joly, Y., Dove, E.S., Knoppers, B.M., Bobrow, M., Chalmers, D.: Data sharing in the post-genomic world: the experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO). PLoS Comput. Biol. 8, e1002549 (2012)CrossRef
5.
Zurück zum Zitat Collins, F.S., Barker, A.D.: Mapping the cancer genome. Sci. Am. 296, 50–57 (2007)CrossRef Collins, F.S., Barker, A.D.: Mapping the cancer genome. Sci. Am. 296, 50–57 (2007)CrossRef
6.
Zurück zum Zitat ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRef ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRef
7.
Zurück zum Zitat Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)CrossRef Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)CrossRef
8.
Zurück zum Zitat Nalbantoglu, Ö.U., Russell, D.J., Sayood, K.: Data compression concepts and algorithms and their applications to bioinformatics. Entropy 12, 34–52 (2009)CrossRef Nalbantoglu, Ö.U., Russell, D.J., Sayood, K.: Data compression concepts and algorithms and their applications to bioinformatics. Entropy 12, 34–52 (2009)CrossRef
9.
Zurück zum Zitat Antoniou, D., Theodoridis, E., Tsakalidis, A.: Compressing biological sequences using self adjusting data structures. In: 2010 10th IEEE International Conference on Information Technology and Applications in Biomedicine (ITAB), pp. 1–5 (2010) Antoniou, D., Theodoridis, E., Tsakalidis, A.: Compressing biological sequences using self adjusting data structures. In: 2010 10th IEEE International Conference on Information Technology and Applications in Biomedicine (ITAB), pp. 1–5 (2010)
10.
Zurück zum Zitat Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Inf. Process. Manag. 30, 875–886 (1994)CrossRefMATH Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Inf. Process. Manag. 30, 875–886 (1994)CrossRefMATH
11.
Zurück zum Zitat Bose, T., Mohammed, M.H., Dutta, A., Mande, S.S.: BIND–an algorithm for loss-less compression of nucleotide sequence data. J. Biosci. 37, 785–789 (2012)CrossRef Bose, T., Mohammed, M.H., Dutta, A., Mande, S.S.: BIND–an algorithm for loss-less compression of nucleotide sequence data. J. Biosci. 37, 785–789 (2012)CrossRef
12.
Zurück zum Zitat Cao, M.D., Dix, T.I., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. In: 2007 Data Compression Conference, DCC 2007, pp. 43–52 (2007) Cao, M.D., Dix, T.I., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. In: 2007 Data Compression Conference, DCC 2007, pp. 43–52 (2007)
13.
Zurück zum Zitat Deorowicz, S., Grabowski, S.: Robust relative compression of genomes with random access. Bioinformatics 27, 2979–2986 (2011)CrossRef Deorowicz, S., Grabowski, S.: Robust relative compression of genomes with random access. Bioinformatics 27, 2979–2986 (2011)CrossRef
14.
Zurück zum Zitat Wandelt, S., Leser, U.: FRESCO: referential compression of highly similar sequences. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 1275–1288 (2013)CrossRef Wandelt, S., Leser, U.: FRESCO: referential compression of highly similar sequences. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 1275–1288 (2013)CrossRef
15.
Zurück zum Zitat Alves, F., Cogo, V., Wandelt, S., Leser, U., Bessani, A.: On-demand indexing for referential compression of DNA sequences. PLoS ONE 10, e0132460 (2015)CrossRef Alves, F., Cogo, V., Wandelt, S., Leser, U., Bessani, A.: On-demand indexing for referential compression of DNA sequences. PLoS ONE 10, e0132460 (2015)CrossRef
16.
17.
Zurück zum Zitat Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952)CrossRefMATH Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952)CrossRefMATH
18.
Zurück zum Zitat Pinho, A.J., Ferreira, P.J., Neves, A.J., Bastos, C.A.: On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS ONE 6, e21588 (2011)CrossRef Pinho, A.J., Ferreira, P.J., Neves, A.J., Bastos, C.A.: On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS ONE 6, e21588 (2011)CrossRef
19.
Zurück zum Zitat Rajarajeswari, P., Apparao, A.: DNABIT compress-genome compression algorithm. Bioinformation 5, 350–360 (2011)CrossRef Rajarajeswari, P., Apparao, A.: DNABIT compress-genome compression algorithm. Bioinformation 5, 350–360 (2011)CrossRef
20.
Zurück zum Zitat Kuruppu, S., Beresford-Smith, B., Conway, T., Zobel, J.: Iterative dictionary construction for compression of large DNA data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9, 137–149 (2012)CrossRef Kuruppu, S., Beresford-Smith, B., Conway, T., Zobel, J.: Iterative dictionary construction for compression of large DNA data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9, 137–149 (2012)CrossRef
21.
Zurück zum Zitat Pratas, D., Pinho, A.J.: Compressing the human genome using exclusively Markov models. In: Rocha, M.P., Rodríguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds.) PACBB 2011, pp. 213–220. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19914-1_29 Pratas, D., Pinho, A.J.: Compressing the human genome using exclusively Markov models. In: Rocha, M.P., Rodríguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds.) PACBB 2011, pp. 213–220. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-19914-1_​29
22.
Zurück zum Zitat Wang, C., Zhang, D.: A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res. 39, e45 (2011)CrossRef Wang, C., Zhang, D.: A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res. 39, e45 (2011)CrossRef
23.
Zurück zum Zitat Saha, S., Rajasekaran, S.: ERGC: an efficient referential genome compression algorithm. Bioinformatics, btv399 (2015) Saha, S., Rajasekaran, S.: ERGC: an efficient referential genome compression algorithm. Bioinformatics, btv399 (2015)
24.
Zurück zum Zitat Li, R., Yu, C., Li, Y., Lam, T.-W., Yiu, S.-M., Kristiansen, K., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009)CrossRef Li, R., Yu, C., Li, Y., Lam, T.-W., Yiu, S.-M., Kristiansen, K., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009)CrossRef
25.
Zurück zum Zitat Luo, Q., Liu, G., Ming, Z., Xiao, F.: Porting and optimizing SOAP2 on Loongson Architecture. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 566–570 (2015) Luo, Q., Liu, G., Ming, Z., Xiao, F.: Porting and optimizing SOAP2 on Loongson Architecture. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 566–570 (2015)
26.
Zurück zum Zitat Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)CrossRef Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)CrossRef
Metadaten
Titel
Porting Referential Genome Compression Tool on Loongson Platform
verfasst von
Zheng Du
Chao Guo
Yijun Zhang
Qiuming Luo
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-6442-5_43

Neuer Inhalt