Skip to main content

2014 | OriginalPaper | Buchkapitel

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

verfasst von : Erik Saule, Kamer Kaya, Ümit V. Çatalyürek

Erschienen in: Parallel Processing and Applied Mathematics

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Its design differs from classical modern processors; it comes with a large number of cores, the 4-way hyperthreading capability allows many applications to saturate the massive memory bandwidth, and its large SIMD capabilities allow to reach high computation throughput. The core of many scientific applications involves the multiplication of a large, sparse matrix with a single or multiple dense vectors which are not compute-bound but memory-bound. In this paper, we investigate the performance of the Xeon Phi coprocessor for these sparse linear algebra kernels. We highlight the important hardware details and show that Xeon Phi’s sparse kernel performance is very promising and even better than that of cutting-edge CPUs and GPUs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the High Performance Computing Networking, Storage and Analysis, SC ’09 (2009) Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the High Performance Computing Networking, Storage and Analysis, SC ’09 (2009)
2.
Zurück zum Zitat Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the SPAA ’09, pp. 233–244 (2009) Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the SPAA ’09, pp. 233–244 (2009)
3.
Zurück zum Zitat Buluç, A., Williams, S., Oliker, L., Demmel, J.: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In: Proceedings of the IPDPS (2011) Buluç, A., Williams, S., Oliker, L., Demmel, J.: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In: Proceedings of the IPDPS (2011)
4.
Zurück zum Zitat Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: Openmp programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, November 2012 Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: Openmp programming on intel xeon phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, November 2012
5.
Zurück zum Zitat Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the ACM National Conference, pp. 157–172 (1969) Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the ACM National Conference, pp. 157–172 (1969)
6.
Zurück zum Zitat Eisenlor, J., Hudak, D.E., Tomko, K., Prince, T.C.: Dense linear algebra factorization in OpenMP and Cilk Plus on Intel MIC: development experiences and performance analysis. In: TACC-Intel Highly Parallel Computing Symposium (2012) Eisenlor, J., Hudak, D.E., Tomko, K., Prince, T.C.: Dense linear algebra factorization in OpenMP and Cilk Plus on Intel MIC: development experiences and performance analysis. In: TACC-Intel Highly Parallel Computing Symposium (2012)
7.
Zurück zum Zitat Im, E.-J., Yelick, K.A.: Optimizing sparse matrix computations for register reuse in SPARSITY. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS 2001. LNCS, vol. 2073, pp. 127–136. Springer, Heidelberg (2001) CrossRef Im, E.-J., Yelick, K.A.: Optimizing sparse matrix computations for register reuse in SPARSITY. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS 2001. LNCS, vol. 2073, pp. 127–136. Springer, Heidelberg (2001) CrossRef
8.
Zurück zum Zitat Jain, A.: pOSKI: an extensible autotuning framework to perform optimized spmvs on multicore architecture. Master’s thesis, UC Berkeley (2008) Jain, A.: pOSKI: an extensible autotuning framework to perform optimized spmvs on multicore architecture. Master’s thesis, UC Berkeley (2008)
9.
Zurück zum Zitat Krotkiewski, M., Dabrowski, M.: Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs. Parallel Comput. 36(4), 181–198 (2010)CrossRefMATHMathSciNet Krotkiewski, M., Dabrowski, M.: Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs. Parallel Comput. 36(4), 181–198 (2010)CrossRefMATHMathSciNet
10.
Zurück zum Zitat Küçüktunç, O., Kaya, K., Saule, E., Çatalyürek, Ü.V.: Fast recommendation on bibliographic networks. In: Proceedings of the ASONAM’12, August 2012 Küçüktunç, O., Kaya, K., Saule, E., Çatalyürek, Ü.V.: Fast recommendation on bibliographic networks. In: Proceedings of the ASONAM’12, August 2012
11.
Zurück zum Zitat Mellor-Crummey, J., Garvin, J.: Optimizing sparse matrix-vector product computations using unroll and jam. Int. J. High Perform. Comput. Appl. 18(2), 225–236 (2004)CrossRef Mellor-Crummey, J., Garvin, J.: Optimizing sparse matrix-vector product computations using unroll and jam. Int. J. High Perform. Comput. Appl. 18(2), 225–236 (2004)CrossRef
12.
Zurück zum Zitat Nishtala, R., Vuduc, R.W., Demmel, J.W., Yelick, K.A.: When cache blocking of sparse matrix vector multiply works and why. Appl. Algebra Eng. Commun. Comput. 18(3), 297–311 (2007)CrossRefMATHMathSciNet Nishtala, R., Vuduc, R.W., Demmel, J.W., Yelick, K.A.: When cache blocking of sparse matrix vector multiply works and why. Appl. Algebra Eng. Commun. Comput. 18(3), 297–311 (2007)CrossRefMATHMathSciNet
13.
Zurück zum Zitat Potluri, S., Tomko, K., Bureddy, D., Panda, D.K.: Intra-MIC MPI communication using MVAPICH2: early experience. In: TACC-Intel Highly Parallel Computing Symposium 2012 (2012) Potluri, S., Tomko, K., Bureddy, D., Panda, D.K.: Intra-MIC MPI communication using MVAPICH2: early experience. In: TACC-Intel Highly Parallel Computing Symposium 2012 (2012)
14.
Zurück zum Zitat Saad, Y.: Sparskit: a basic tool kit for sparse matrix computations - version 2 (1994) Saad, Y.: Sparskit: a basic tool kit for sparse matrix computations - version 2 (1994)
15.
Zurück zum Zitat Saule, E., Çatalyürek, Ü.V.: An early evaluation of the scalability of graph algorithms on the Intel MIC architecture. In: IPDPS Workshop MTAAP (2012) Saule, E., Çatalyürek, Ü.V.: An early evaluation of the scalability of graph algorithms on the Intel MIC architecture. In: IPDPS Workshop MTAAP (2012)
16.
Zurück zum Zitat Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. Technical Report arXiv:1302.1078, ArXiv, Feb. 2013 Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. Technical Report arXiv:1302.1078, ArXiv, Feb. 2013
17.
Zurück zum Zitat Stock, K., Pouchet, L.-N., Sadayappan, P.: Automatic transformations for effective parallel execution on intel many integrated core. In: TACC-Intel Highly Parallel Computing Symposium (2012) Stock, K., Pouchet, L.-N., Sadayappan, P.: Automatic transformations for effective parallel execution on intel many integrated core. In: TACC-Intel Highly Parallel Computing Symposium (2012)
18.
Zurück zum Zitat Vuduc, R., Demmel, J., Yelic, K.: OSKI: a library of automatically tuned sparse matrix kernels. In: Proceedings of the SciDAC 2005, J. of Physics: Conference Series (2005) Vuduc, R., Demmel, J., Yelic, K.: OSKI: a library of automatically tuned sparse matrix kernels. In: Proceedings of the SciDAC 2005, J. of Physics: Conference Series (2005)
19.
Zurück zum Zitat Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of the SC ’07 (2007) Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of the SC ’07 (2007)
20.
Zurück zum Zitat Zhou, Z., Saule, E., Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P., Çatalyürek, Ü.V.: An out-of-core eigensolver on SSD-equipped clusters. In: Proceedings of the IEEE Cluster, September 2012 Zhou, Z., Saule, E., Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P., Çatalyürek, Ü.V.: An out-of-core eigensolver on SSD-equipped clusters. In: Proceedings of the IEEE Cluster, September 2012
Metadaten
Titel
Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi
verfasst von
Erik Saule
Kamer Kaya
Ümit V. Çatalyürek
Copyright-Jahr
2014
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-55224-3_52

Premium Partner