article

A recursive formulation of Cholesky factorization of a matrix in packed storage

Authors:
Bjarne Stig Andersen

Danish Computing Center for Research and Education, Lyngby, Denmark

Danish Computing Center for Research and Education, Lyngby, Denmark
View Profile

,
Jerzy Waśniewski

Danish Computing Center for Research and Education, Lyngby, Denmark

Danish Computing Center for Research and Education, Lyngby, Denmark
View Profile

,
Fred G. Gustavson

IBM T. J. Watson Research Center, Yorktown Heights, NY

IBM T. J. Watson Research Center, Yorktown Heights, NY
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 27 Issue 2pp 214–244https://doi.org/10.1145/383738.383741

Published:01 June 2001Publication History

ACM Transactions on Mathematical Software

Abstract

A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format are included. A new algorithm, called RPC for Recursive Packed Cholesky, that operates on the RPG format is presented. ALgorithm RPC is basd on level-3 BLAS and requires variants of algorithms TRSM and SYRK that work on RPF. We call these RP_TRSM and RP_SYRK and find that they do most of their work by calling GEMM. It follows that most of the execution time of RPC lies in GEMM. The advantage of this storage scheme compared to traditional packed and full storage is demonstrated. First, the RPC storage format uses the minimal amount of storage for the symmetric or triangular matrix. Second, RPC gives a level-3 implementation of Cholesky factorization whereas standard packed implementations are only level 2. Hence, the performance of our RPC implementation is decidedly superior. Third, unlike fixed block size algorithms, RPC, requires no block size tuning parameter. We present performance measurements on several current architectures that demonstrate improvements over the traditional packed routines. Also MSP parallel computations on the IBM SMP computer are made. The graphs that are attached in Section 7 show that the RPC algorithms are superior by a factor between 1.6 and 7.4 for order around 1000, and between 1.9 and 10.3 for order around 3000 over the traditional packed algorithms. For some architectures, the RPC performance results are almost the same or even better than the traditional full-storage algorithms results.

References

AGARWAL,R.C.,GUSTAVSON,F.G.,AND ZUBAIR, M. 1994. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Dev. 38, 5 (Sept.), 563-576. Google Scholar
ANDERSEN,B.S.,GUSTAVSON, F., KARAIVANOV, A., WASNIEWSKI, J., AND YALAMOV, P. Y. 1999. LAWRA-Linear algebras with recursive algorithms. In Proceedings of the 3rd Interna-tional Conference on Parallel Processing and Applied Mathematics (PPAM '99, Kazimierz Dolny, Poland), J. Szopa. 63-76.Google Scholar
ANDERSON, E., BAI, Z., BISCHOF, C., BLACKFORD,L.S.,DEMMEL, J., DONGARRA, J., DU CROZ, J., GREENBAUM, A., HAMMARLING, S., MCKENNEY, A., AND SORENSON, D. 1999. LAPACK Users's Guide. 3rd ed. SIAM, Philadelphia, PA. Google Scholar
BILMES, J., ASANOVIC, K., CHIN, C.-W., AND DEMMEL, J. 1997. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the 1997 International Conference on Supercomputing (ICS '97, Vienna, Austria, July 7-11), S. J. Wallach and H. Zima, Chairs. ACM Press, New York, NY, 340-347. Google Scholar
DEMMEL, J. W. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA. Google Scholar
DONGARRA,J.AND WASNIEWSKI, J. 1999. High performance linear algebra package- LAPACK90. In Advances in Randomized Parallel Computing, P. M. Pardalos and S. Rajasekaran, Eds. Combinatorial Optimization, vol. 5. Kluwer Academic Publishers, Hingham, MA, 241-275.Google Scholar
DONGARRA,J.J.,DU CROZ,J.J.,HAMMARLING, S., AND DUFF, I. S. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1 (Mar.), 1-17. Google Scholar
DONGARRA,J.J.,DU CROZ, J., HAMMARLING, S., AND HANSON, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar.), 1-17. Google Scholar
DONGARRA,J.J.,DUFF,I.S.,SORENSEN,D.C.,AND VAN DER VORST, H. A. 1998. Numerical Linear Algebra for High-Performance Computers. 2nd ed. SIAM, Philadelphia, PA. Google Scholar
GOLUB,G.H.AND VAN LOAN, C. F. 1996. Matrix Computations. 3rd ed. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore, MD. Google Scholar
GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM, B., AND LING, P. 1998. Recursive blocked data formats and BLAS for dense linear algebra algorithms. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 195-206. Google Scholar
GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM, B., AND LING, P. 1998. Superscalar GEMM-based level 3 BLAS-The on-going evolution of portable and high-performance library. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 207-215. Google Scholar
GUSTAVSON, F. G. 1997. Recursion leads to automatic variable blocking for dense linearalgebra algorithms. IBM J. Res. Dev. 41, 6, 737-756. Google Scholar
HIGHAM, N. J. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA. Google Scholar
IBM. 1997a. IBM engineering and scientific subroutine library for AIX, version 3, volume 1. Pub. No. SA22-7272-0.Google Scholar
IBM. 1997b. XL Fortran AIX, language reference, 1st edition, version 5, release 1.Google Scholar
IBM. 1997c. XL Fortran AIX User's Guide. 1st ed. IBM Corp., Riverton, NJ. Version 5, release 1.Google Scholar
KAGSTROM, B., LING, P., AND VAN LOAN, C. 1998. GEMM-based level 3 BLAS: Highperformance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. 24, 3 (Sept.), 268-302. Google Scholar
LAWSON,C.L.,HANSON,R.J.,KINCAID,D.R.,AND KROGH, F. T. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5, 3, 308-323. Google Scholar
METCALF,M.AND REID, J. 1999. Fortran 90/95 Explained. 2nd ed. Oxford University Press, Oxford, UK. Google Scholar
TOLEDO, S. 1997. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18, 4, 1065-1081. Google Scholar
TREFETHEN,L.N.AND BAU, D. 1997. Numerical Lineary Algebra. SIAM, Philadelphia, PA.Google Scholar
WASNIEWSKI, J., ANDERSEN,B.S.,AND GUSTAVSON, F. 1998. Recursive formation of Cholesky algorithm in Fortran 90. In Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems (PARA '98, Umea, Sweden, June), B. Kagstrom, J. Dongarra, E. Elmroth, and J. Wasniewski, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin-Heidelberg, Germany, 574- 578. Google Scholar
WHALEY,R.C.AND DONGARRA, J. 1999. ATLAS: Automatically tuned linear algebra software. University of Tennessee at Knoxville, Knoxville, TN. http://www.netlib.org/atlas/. Google Scholar

Index Terms

A recursive formulation of Cholesky factorization of a matrix in packed storage
1. Computing methodologies
  1. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
      1. Linear algebra algorithms
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
  2. Mathematical software

Recommendations

Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion

We describe a new data format for storing triangular, symmetric, and Hermitian matrices called Rectangular Full Packed Format (RFPF). The standard two-dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular ...
Read More
Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format

We present subroutines for the Cholesky factorization of a positive-definite symmetric matrix and for solving corresponding sets of linear equations. They exploit cache memory by using the block hybrid format proposed by the authors in a companion ...
Read More
A fully portable high performance minimal storage hybrid format Cholesky algorithm

We consider the efficient implementation of the Cholesky solution of symmetric positive-definite dense linear systems of equations using packed storage. We take the same starting point as that of LINPACK and LAPACK, with the upper (or lower) triangular ...
Read More

Reviews

Reviewer: Maurice W. Benson

Due to symmetry in Cholesky factorization, only the lower (or upper) triangular part of the matrix need be stored. With standard packed storage, matrix elements in the triangular part are ordered by columns (or rows) and stored in a one-dimensional array. This presents problems of efficiency where memory hierarchy (CPU, registers, cache, and so on) is better used if a locality of reference (blocking) is present in the algorithm. These considerations motivate a new packed storage arrangement produced by a recursive approach, where at the coarsest level, the triangular part of the matrix is partitioned into an approximately square part and two smaller triangular parts. This process is continued recursively for the two smaller triangular parts, with the final new packing taking the same total space as standard packing. The recursive Cholesky algorithm using this new packed storage gets superior performance by utilizing level-3 BLAS (Basic Linear Algebra Subroutines). Using well-constructed figures, the paper provides a clear description of the algorithms. A significant part of the presentation involves tests on a variety of machines (seven) and over a range of matrix sizes. Standard packing, the new formulation, and full matrix (no packing) algorithms are compared. The new approach is shown to be significantly more effective than the one using standard packing. This work should be of interest to researchers in computational linear algebra, as well as to those developing high- performance software libraries. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Mathematical Software Volume 27, Issue 2
June 2001
154 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/383738
Issue’s Table of Contents

Copyright © 2001 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2001
Published in toms Volume 27, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
BLAS
Cholesky factorization and solution
complex Hermitian matrices
novel packed matrix data structures
positive definite matrices
real symmetric matrices
recursive algorithms
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 952
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A recursive formulation of Cholesky factorization of a matrix in packed storage

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion

Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format

A fully portable high performance minimal storage hybrid format Cholesky algorithm

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A recursive formulation of Cholesky factorization of a matrix in packed storage

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion

Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format

A fully portable high performance minimal storage hybrid format Cholesky algorithm

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media