nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Design Principles for Sparse Matrix Multiplication on the GPU

verfasst von : Carl Yang, Aydın Buluç, John D. Owens

Erschienen in: Euro-Par 2018: Parallel Processing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients—(i) merge-based load-balancing and (ii) row-major coalesced memory access—we demonstrate a 4.1\(\times \) peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel High-Quality Shared-Memory Graph Partitioning

Nächstes Kapitel Distributed Graph Clustering Using Modularity and Map Equation

https://doi.org/10.6084/m9.figshare.6378764.

https://github.com/owensgroup/merge-spmm.

Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: International Conference on Learning Representations (ICLR) (2016)

Sarıyüce, A.E., Saule, E., Kaya, K., Çatalyürek, Ü.V.: Regularizing graph centrality computations. J. Parallel Distrib. Comput. 76, 106–119 (2015)CrossRef

Tiskin, A.: All-pairs shortest paths computation in the BSP model. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 178–189. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48224-5_15CrossRef

Simoncini, V., Gallopoulos, E.: An iterative method for nonsymmetric systems with multiple right-hand sides. SIAM J. Sci. Comput. 16(4), 917–933 (1995)MathSciNetCrossRef

Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)CrossRef

Knyazev, A.V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM SISC 23(2), 517–541 (2001)MathSciNetCrossRef

Wang, H., Banerjee, A., Hsieh, C.J., Ravikumar, P.K., Dhillon, I.S.: Large scale distributed sparse precision estimation. In: NIPS, pp. 584–592 (2013)

Si, S., Shin, D., Dhillon, I.S., Parlett, B.N.: Multi-scale spectral decomposition of massive graphs. In: NIPS, pp. 2798–2806 (2014)

Kannan, R., Ballard, G., Park, H.: A high-performance parallel algorithm for nonnegative matrix factorization. In: ACM SIGPLAN, vol. 51. ACM (2016)CrossRef

10.

Vazquez, F., Garzon, E.M., Fernandez, J.J.: A matrix approach to tomographic reconstruction and its implementation on GPUs. J. Struct. Biol. 170(1), 146–151 (2010)CrossRef

11.

Buluç, A., Mattson, T., McMillan, S., Moreira, J., Yang, C.: Design of the GraphBLAS API for C. In: IEEE Workshop on Graph Algorithm Building Blocks, IPDPSW (2017)

12.

Baxter, S.: Modern GPU library (2015). http://nvlabs.github.io/moderngpu/

13.

Dalton, S., Olson, L., Bell, N.: Optimizing sparse matrix-matrix multiplication for the GPU. ACM TOMS 41(4), 25 (2015)MathSciNetCrossRef

14.

Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Supercomputing 2016, pp. 678–689. IEEE, November 2016

15.

Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM TOMS 38(1), 1 (2011)MathSciNetMATH

16.

Ortega, G., Vázquez, F., García, I., Garzón, E.M.: FastSpMM: an efficient library for sparse matrix matrix product on GPUs. Computer 57(7), 968–979 (2014)

17.

Anzt, H., Tomov, S., Dongarra, J.: Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product. In: Proceedings of the Symposium on High Performance Computing, pp. 75–82 (2015)

18.

Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: Proceedings of the IPDPS. IEEE Computer Society (2014)

19.

Filippone, S., Cardellini, V., Barbieri, D., Fanfarillo, A.: Sparse matrix-vector multiplication on GPGPUs. ACM TOMS 43(4), 30 (2017)MathSciNetCrossRef

20.

Naumov, M., Chien, L.S., Vandermersch, P., Kapasi, U.: CUSPARSE library: a set of basic linear algebra subroutines for sparse matrices. In: GTC (2010)

21.

Hong, C., et al.: Efficient sparse-matrix multi-vector product on GPUs. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. HPDC 2018, pp. 66–79. ACM, New York (2018)

22.

Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: Supercomputing 2008, pp. 31:1–31:11, November 2008

23.

Jablin, J.A., Jablin, T.B., Mutlu, O., Herlihy, M.: Warp-aware trace scheduling for GPUs. In: ACM PACT 2014, pp. 163–174 (2014)

24.

Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Supercomputing 2009, pp. 18:1–18:11, November 2009

25.

Yang, C., Buluc, A., Owens, J.D.: Supporting data for design principles for sparse matrix multiplication on the GPU paper at euro-par 2018 (2018). https://doi.org/10.6084/m9.figshare.6378764

26.

Greiner, G., Jacob, R.: The I/O complexity of sparse matrix dense matrix multiplication. In: López-Ortiz, A. (ed.) LATIN 2010. LNCS, vol. 6034, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12200-2_14CrossRef

27.

Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of SPAA (2009)

28.

Merrill, D.: CUB library (2015). http://nvlabs.github.io/cub

Titel: Design Principles for Sparse Matrix Multiplication on the GPU
verfasst von: Carl Yang
Aydın Buluç
John D. Owens
Verlag: Springer International Publishing
Buch: Euro-Par 2018: Parallel Processing
Print ISBN: 978-3-319-96982-4

Electronic ISBN: 978-3-319-96983-1

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-96983-1_48

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner