2019 | OriginalPaper | Buchkapitel
Design of a High-Performance Tensor-Vector Multiplication with BLAS
verfasst von : Cem Bassoy
Erschienen in: Computational Science – ICCS 2019
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Abstract
GEMM
s (LOG
) approach, this paper discusses the design of high-performance algorithms for the mode-q tensor-vector multiplication using efficient implementations of the matrix-vector multiplication (GEMV
). Given dense tensors with any non-hierarchical storage format, tensor order and dimensions, the proposed algorithms either directly call GEMV
with tensors or recursively apply GEMV
on higher-order tensor slices multiple times. We analyze strategies for loop-fusion and parallel execution of slice-vector multiplications with higher-order tensor slices. Using OpenBLAS
, our parallel implementation attains 34.8 Gflops/s in single precision on a Core i9-7900X Intel Xeon processor. Our parallel version of the tensor-vector multiplication is on average 6.1x and up to 12.6x faster than state-of-the-art approaches.