Abstract
A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format are included. A new algorithm, called RPC for Recursive Packed Cholesky, that operates on the RPG format is presented. ALgorithm RPC is basd on level-3 BLAS and requires variants of algorithms TRSM and SYRK that work on RPF. We call these RP_TRSM and RP_SYRK and find that they do most of their work by calling GEMM. It follows that most of the execution time of RPC lies in GEMM. The advantage of this storage scheme compared to traditional packed and full storage is demonstrated. First, the RPC storage format uses the minimal amount of storage for the symmetric or triangular matrix. Second, RPC gives a level-3 implementation of Cholesky factorization whereas standard packed implementations are only level 2. Hence, the performance of our RPC implementation is decidedly superior. Third, unlike fixed block size algorithms, RPC, requires no block size tuning parameter. We present performance measurements on several current architectures that demonstrate improvements over the traditional packed routines. Also MSP parallel computations on the IBM SMP computer are made. The graphs that are attached in Section 7 show that the RPC algorithms are superior by a factor between 1.6 and 7.4 for order around 1000, and between 1.9 and 10.3 for order around 3000 over the traditional packed algorithms. For some architectures, the RPC performance results are almost the same or even better than the traditional full-storage algorithms results.
- AGARWAL,R.C.,GUSTAVSON,F.G.,AND ZUBAIR, M. 1994. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Dev. 38, 5 (Sept.), 563-576. Google Scholar
- ANDERSEN,B.S.,GUSTAVSON, F., KARAIVANOV, A., WASNIEWSKI, J., AND YALAMOV, P. Y. 1999. LAWRA-Linear algebras with recursive algorithms. In Proceedings of the 3rd Interna-tional Conference on Parallel Processing and Applied Mathematics (PPAM '99, Kazimierz Dolny, Poland), J. Szopa. 63-76.Google Scholar
- ANDERSON, E., BAI, Z., BISCHOF, C., BLACKFORD,L.S.,DEMMEL, J., DONGARRA, J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., MCKENNEY, A., AND SORENSON, D. 1999. LAPACK Users's Guide. 3rd ed. SIAM, Philadelphia, PA. Google Scholar
- BILMES, J., ASANOVIC, K., CHIN, C.-W., AND DEMMEL, J. 1997. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the 1997 International Conference on Supercomputing (ICS '97, Vienna, Austria, July 7-11), S. J. Wallach and H. Zima, Chairs. ACM Press, New York, NY, 340-347. Google Scholar
- DEMMEL, J. W. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA. Google Scholar
- DONGARRA,J.AND WASNIEWSKI, J. 1999. High performance linear algebra package- LAPACK90. In Advances in Randomized Parallel Computing, P. M. Pardalos and S. Rajasekaran, Eds. Combinatorial Optimization, vol. 5. Kluwer Academic Publishers, Hingham, MA, 241-275.Google Scholar
- DONGARRA,J.J.,DU CROZ,J.J.,HAMMARLING, S., AND DUFF, I. S. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1 (Mar.), 1-17. Google Scholar
- DONGARRA,J.J.,DU CROZ, J., HAMMARLING, S., AND HANSON, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar.), 1-17. Google Scholar
- DONGARRA,J.J.,DUFF,I.S.,SORENSEN,D.C.,AND VAN DER VORST, H. A. 1998. Numerical Linear Algebra for High-Performance Computers. 2nd ed. SIAM, Philadelphia, PA. Google Scholar
- GOLUB,G.H.AND VAN LOAN, C. F. 1996. Matrix Computations. 3rd ed. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore, MD. Google Scholar
- GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM, B., AND LING, P. 1998. Recursive blocked data formats and BLAS for dense linear algebra algorithms. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 195-206. Google Scholar
- GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM, B., AND LING, P. 1998. Superscalar GEMM-based level 3 BLAS-The on-going evolution of portable and high-performance library. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 207-215. Google Scholar
- GUSTAVSON, F. G. 1997. Recursion leads to automatic variable blocking for dense linearalgebra algorithms. IBM J. Res. Dev. 41, 6, 737-756. Google Scholar
- HIGHAM, N. J. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA. Google Scholar
- IBM. 1997a. IBM engineering and scientific subroutine library for AIX, version 3, volume 1. Pub. No. SA22-7272-0.Google Scholar
- IBM. 1997b. XL Fortran AIX, language reference, 1st edition, version 5, release 1.Google Scholar
- IBM. 1997c. XL Fortran AIX User's Guide. 1st ed. IBM Corp., Riverton, NJ. Version 5, release 1.Google Scholar
- KAGSTROM, B., LING, P., AND VAN LOAN, C. 1998. GEMM-based level 3 BLAS: Highperformance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24, 3 (Sept.), 268-302. Google Scholar
- LAWSON,C.L.,HANSON,R.J.,KINCAID,D.R.,AND KROGH, F. T. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 3, 308-323. Google Scholar
- METCALF,M.AND REID, J. 1999. Fortran 90/95 Explained. 2nd ed. Oxford University Press, Oxford, UK. Google Scholar
- TOLEDO, S. 1997. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18, 4, 1065-1081. Google Scholar
- TREFETHEN,L.N.AND BAU, D. 1997. Numerical Lineary Algebra. SIAM, Philadelphia, PA.Google Scholar
- WASNIEWSKI, J., ANDERSEN,B.S.,AND GUSTAVSON, F. 1998. Recursive formation of Cholesky algorithm in Fortran 90. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 574- 578. Google Scholar
- WHALEY,R.C.AND DONGARRA, J. 1999. ATLAS: Automatically tuned linear algebra software. University of Tennessee at Knoxville, Knoxville, TN. http://www.netlib.org/atlas/. Google Scholar
Index Terms
- A recursive formulation of Cholesky factorization of a matrix in packed storage
Recommendations
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion
We describe a new data format for storing triangular, symmetric, and Hermitian matrices called Rectangular Full Packed Format (RFPF). The standard two-dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular ...
Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format
We present subroutines for the Cholesky factorization of a positive-definite symmetric matrix and for solving corresponding sets of linear equations. They exploit cache memory by using the block hybrid format proposed by the authors in a companion ...
A fully portable high performance minimal storage hybrid format Cholesky algorithm
We consider the efficient implementation of the Cholesky solution of symmetric positive-definite dense linear systems of equations using packed storage. We take the same starting point as that of LINPACK and LAPACK, with the upper (or lower) triangular ...
Comments