Abstract
Recursive blocked data formats and recursive blocked BLAS’s are introduced and applied to dense linear algebra algorithms that are typified by LAPACK. The new data formats allow for maintaining data locality at every level of the memory hierarchy and hence providing high performance on today’s memory tiered processors. This new data format is hybrid. It contains blocking parameters which are chosen so that the associated submatrices of a block-partitioned A fir into level 1 cache. The recursive part of the data format chooses a linear order of the blocks that maintains a two-dimensional data locality of A in a one-dimensional tiered memory structure. We argue that, out of the NB factorial choices of ordering the NB blocks, our recursive ordering leads to one of the best. This is because our algorithms are also recursive and will do their computations on submatrices that follow the new recursive data structure definition. This is in analogy with the well known principle that the data structure should be matched to the algorithm. Performance results in support for our recursive approach are also presented.
Preview
Unable to display preview. Download preview PDF.
References
R. C. Agarwal, F. G. Gustavson, and M. Zubair. Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch. IBM J. Res. Develop, 38(3):265–275, May 1994.
E. Anderson, Z. Bai, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen. LAPACK Users’ Guide, Second Edition. SIAM Publications, Philadelphia, 1995.
J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw., 16(1):1–17, March 1990.
E. Elmroth and F. Gustavson. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems, This Proceedings, Springer Verlag, 1998.
IBM. Engineering and Scientific Subroutine Library, Guide and Reference, January 1994. SC23-0526-01.
F. Gustavson. Recursion leads to automatic variable blocking for dense linear algebra. IBM J. Res. Develop, 41(6):737–755, November 1997.
F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström and P. Ling. Superscalar GEMM-based Level 3 BLAS—The Ongoing Evolution of a Portable High-Performance Library. This Proceedings, Springer Verlag, 1998.
A. Henriksson and I. Jonsson. High-Performance Matrix Multiplication on the IBM SP High Node. Master Thesis, UMNAD 98.235, Department of Computing Science, Umeå University, S-901 87 Umeå, June 1998.
B. Kågström and C. Van Loan. GEMM-Based Level-3 BLAS. Technical Report CTC91TR47, Department of Computer Science, Cornell University, December 1989.
B. Kågström, P. Ling, and C. Van Loan. GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Software, 1997. Accepted for publication.
S. Toledo. Locality of Reference in LU Decomposition with Partial Pivoting. SIAM J. Matrix Anal. Appl., 18(4):1065–1081, 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gustavson, F., Henriksson, A., Jonsson, I., Kågström, B., Ling, P. (1998). Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In: Kågström, B., Dongarra, J., Elmroth, E., Waśniewski, J. (eds) Applied Parallel Computing Large Scale Scientific and Industrial Problems. PARA 1998. Lecture Notes in Computer Science, vol 1541. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095337
Download citation
DOI: https://doi.org/10.1007/BFb0095337
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65414-8
Online ISBN: 978-3-540-49261-0
eBook Packages: Springer Book Archive