Skip to main content

2018 | OriginalPaper | Buchkapitel

A BLAS-Based Algorithm for Finding Position Weight Matrix Occurrences in DNA Sequences on CPUs and GPUs

verfasst von : Jan Fostier

Erschienen in: Bioinformatics and Biomedical Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Finding all matches of a set of position weight matrices (PWMs) in large DNA sequences is a compute-intensive task. We propose a light-weight algorithm inspired by high performance computing techniques in which the problem of finding PWM occurrences is expressed in terms of matrix-matrix products which can be performed efficiently by highly optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs. It is competitive on CPUs with state-of-the-art software for matching PWMs in terms of runtime while requiring far less memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 41 min with a p-value of \(10^{-4}\) using a 24-core machine. On a dual GPU system, the same task can be performed in under 5 min.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)CrossRef Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)CrossRef
2.
Zurück zum Zitat Dorohonceanu, B., Nevill-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 19–23 August 2000, La Jolla/San Diego, CA, USA, pp. 128–133 (2000) Dorohonceanu, B., Nevill-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 19–23 August 2000, La Jolla/San Diego, CA, USA, pp. 128–133 (2000)
3.
Zurück zum Zitat Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinf. 7(1), 389+ (2006)CrossRef Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinf. 7(1), 389+ (2006)CrossRef
6.
Zurück zum Zitat Pizzi, C., Rastas, P., Ukkonen, E.: Finding significant matches of position weight matrices in linear time. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(1), 69–79 (2011)CrossRef Pizzi, C., Rastas, P., Ukkonen, E.: Finding significant matches of position weight matrices in linear time. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(1), 69–79 (2011)CrossRef
7.
Zurück zum Zitat Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P., Ukkonen, E.: MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25(23), 3181–3182 (2009)CrossRef Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P., Ukkonen, E.: MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25(23), 3181–3182 (2009)CrossRef
8.
Zurück zum Zitat Giraud, M., Varré, J.S.: Parallel position weight matrices algorithms. Parallel Comput. 37(8), 466–478 (2011)CrossRef Giraud, M., Varré, J.S.: Parallel position weight matrices algorithms. Parallel Comput. 37(8), 466–478 (2011)CrossRef
9.
Zurück zum Zitat Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)CrossRefMATH Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)CrossRefMATH
10.
Zurück zum Zitat Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC 1998. IEEE Computer Society, Washington, DC, USA, pp. 1–27 (1998) Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC 1998. IEEE Computer Society, Washington, DC, USA, pp. 1–27 (1998)
11.
Zurück zum Zitat Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008)MathSciNetCrossRefMATH Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008)MathSciNetCrossRefMATH
12.
Zurück zum Zitat Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2013) Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2013)
13.
Zurück zum Zitat Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C.Y.Y., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., Zhang, A.W., Parcy, F., Lenhard, B., Sandelin, A., Wasserman, W.W.: JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44(D1), D110–D115 (2016)CrossRef Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C.Y.Y., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., Zhang, A.W., Parcy, F., Lenhard, B., Sandelin, A., Wasserman, W.W.: JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44(D1), D110–D115 (2016)CrossRef
Metadaten
Titel
A BLAS-Based Algorithm for Finding Position Weight Matrix Occurrences in DNA Sequences on CPUs and GPUs
verfasst von
Jan Fostier
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-78723-7_38