skip to main content
article

FLAME: Formal Linear Algebra Methods Environment

Published:01 December 2001Publication History
Skip Abstract Section

Abstract

Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. The development and maintenance of libraries for these architectures is simply too complex to be amenable to conventional approaches to implementation. Attempts to employ traditional methodology have led, in our opinion, to the production of an abundance of anfractuous code that is difficult to maintain and almost impossible to upgrade.Having struggled with these issues for more than a decade, we have concluded that a solution is to apply a technique from theoretical computer science, formal derivation, to the development of high-performance linear algebra libraries. We think the resulting approach results in aesthetically pleasing, coherent code that greatly facilitates intelligent modularity and high performance while enhancing confidence in its correctness. Since the technique is language-independent, it lends itself equally well to a wide spectrum of programming languages (and paradigms) ranging from C and Fortran to C++ and Java. In this paper, we illustrate our observations by looking at the Formal Linear Algebra Methods Environment (FLAME), a framework that facilitates the derivation and implementation of linear algebra algorithms on sequential architectures. This environment demonstrates that lessons learned in the distributed-memory world can guide us toward better approaches even in the sequential world.We present performance experiments on the Intel (R) Pentium (R) III processor that demonstrate that high performance can be attained by coding at a high level of abstraction.

References

  1. AGARWAL, R., GUSTAVSON,F.,AND ZUBAIR, M. 1994. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Dev. 38, 5 (Sept.), 563-576. Google ScholarGoogle Scholar
  2. ANDERSON, E., BAI, Z., DEMMEL, J., DONGARRA, J. E., DUCROZ, J., GREENBAUM, A., HAMMARLING,S., MCKENNEY, A. E., OSTROUCHOV,S.,AND SORENSEN, D. 1992. LAPACK Users' Guide. SIAM, Philadelphia. Google ScholarGoogle Scholar
  3. ANDERSEN,B.S.,GUSTAVSON,F.G.,AND WASNIEWSKI, J. 2000. A recursive formulation of Cholesky factorization of a matrix in packed storage. LAPACK Working Note 146 CS-00-441, University of Tennessee, Knoxville (May).Google ScholarGoogle Scholar
  4. CHOI, J., DONGARRA,J.J.,POZO, R., AND WALKER, D. W. 1992. Scalapack: A scalable linear algebra library for distributed memory concurrent computers. In Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation. IEEE Computer Society Press, 120-127.Google ScholarGoogle Scholar
  5. CROUT, P. D. 1941. A short method for evaluating determinants and solving systems of linear equations with real or complex coefficients. Trans AIEE 60, 1235-1240.Google ScholarGoogle Scholar
  6. DIJKSTRA, E. W. 2000. Under the spell of Leibniz's dream. Tech. Rep. EWD1298, The University of Texas at Austin (April). http://www.cs.utexas.edu/users/EWD/.Google ScholarGoogle Scholar
  7. DONGARRA,J.J.,BUNCH, J. R., MOLER,C.B.,AND STEWART, G. W. 1979. LINPACK Users' Guide SIAM, Philadelphia.Google ScholarGoogle Scholar
  8. DONGARRA,J.J.,DUCROZ, J., HAMMARLING,S.,AND DUFF, I. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16, 1 (March), 1-17. Google ScholarGoogle Scholar
  9. DONGARRA,J.J.,DUCROZ, J., HAMMARLING,S.,AND HANSON, R. J. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Soft. 14, 1 (March), 1-17. Google ScholarGoogle Scholar
  10. DONGARRA,J.J.,DUFF,I.S.,SORENSEN,D.C.,AND VAN DER VORST, H. A. 1991. Solving Linear Systems on Vector and Shared Memory Computers. SIAM, Philadelphia, PA. Google ScholarGoogle Scholar
  11. DONGARRA,J.J.,GUSTAVSON,F.G.,AND KARP, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review 26, 1 (Jan.), 91-112.Google ScholarGoogle Scholar
  12. ELMROTH,E.AND GUSTAVSON, F. 2000. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. Dev. 44, 4, 605-624. Google ScholarGoogle Scholar
  13. GRIES,D.AND SCHNEIDER, F. B. 1992. A Logical Approach to Discrete Math. Texts and Monographs in Computer Science. Springer Verlag, New York. Google ScholarGoogle Scholar
  14. GUNNELS, J. 2001. A systematic approach to the design and analysis of parallel dense linear algebra algorithms. Ph.D dissertation Department of Computer Sciences, The University of Texas. Google ScholarGoogle Scholar
  15. GUNNELS, J. A., HENRY,G.M.,AND VAN DE GEIJN, R. A. 2001. A family of high-performance matrix multiplication algorithms. In Computational Science - ICCS 2001, Part I, V. N. Alexandrov, J. J. Dongarra, B. A. Juliano, R. S. Renner, and C. K. Tan, Eds. Lecture Notes in Computer Science 2073. Springer-Verlag, New York, 51-60. Google ScholarGoogle Scholar
  16. GUNNELS, J., LIN, C., MORROW,G.,AND VAN DE GEIJN, R. 1998. A flexible class of parallel matrix multiplication algorithms. In Proceedings of First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (1998 IPPS/SPDP '98). 110-116. Google ScholarGoogle Scholar
  17. GUNNELS,J.A.AND VAN DE GEIJN, R. A. 2001a. Developing linear algebra algorithms: A collection of class projects. Tech. Rep. CS-TR-01-19, Department of Computer Sciences, The University of Texas at Austin, May. http://www.cs.utexas.edu/users/flame/pubs.html. Google ScholarGoogle Scholar
  18. GUNNELS,J.A.AND VAN DE GEIJN, R. A. 2001b. Formal methods for high-performance linear algebra libraries. In The Architecture of Scientific Software, R. F. Boisvert and P. T. P. Tang, Eds. Kluwer Academic Press, Orlando, FL, 193-210. Google ScholarGoogle Scholar
  19. GUNTER,B.C.,REILEY,W.C.,AND VAN DE GEIJN, R. A. 2001. Parallel out-of-core cholesky and qr factorizations with pooclapack. In Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle Scholar
  20. GUSTAVSON, F. G. 1997. Recursion leads to automatic variable blocking for dense linearalgebra algorithms. IBM J. of Res. Dev. 41, 6 (November), 737-755. Google ScholarGoogle Scholar
  21. GUSTAVSON, F. G. 2001. New generalized matrix data structures lead to a variety of highperformance algorithms. In The Architecture of Scientific Software, R. F. Boisvert and P. T. P. Tang, Eds: Kluwer Academic Press, Orlando, FL. Google ScholarGoogle Scholar
  22. GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM,B.,AND LING, P. 1998a. Recursive blocked data formats and BLAS's for dense linear algebra algorithms. In Applied Parallel Computing, Large Scale Scientific and Industrial Problems, B. K. et al., Ed. Lecture Notes in Computer Science 1541. Springer-Verlag, New York, 195-206. Google ScholarGoogle Scholar
  23. GUSTAVSON, F., HENRIKSSON, A., JONSSON, I., KAGSTROM,B.,AND LING, P. 1998b. Superscalar GEMM- based level 3 BLAS-the on-going evolution of a portable and high-performance library. In Applied Parallel Computing, Large Scale Scientific and Industrial Problems, B. K. et al., Ed. Lecture Notes in Computer Science 1541. Springer-Verlag, New York, 207-215. Google ScholarGoogle Scholar
  24. GUSTAVSON,F.AND JONSSON, I. 2000. Minimal storage high-performance Cholesky factorization via blocking and recursion. IBM J. Res. Dev. 44, 6 (November), 823-850. Google ScholarGoogle Scholar
  25. KAGSTROM, B., LING,P.,AND LOAN, C. V. 1998. GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark. ACM Trans. Math. Soft. 24,3, 268-302. Google ScholarGoogle Scholar
  26. LAWSON, C. L., HANSON,R.J.,KINCAID,D.R.,AND KROGH, F. T. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Soft. 5, 3 (Sept.), 308-323. Google ScholarGoogle Scholar
  27. REILEY, W. C. 1999. Efficient parallel out-of-core implementation of the Cholesky factorization. Tech. Rep. CS-TR-99-33, Department of Computer Sciences, The University of Texas at Austin. (Dec.) Undergraduate Honors Thesis. Google ScholarGoogle Scholar
  28. REILEY,W.C.AND VAN DE GEIJN, R. A. 1999. POOCLAPACK: Parallel Out-of-Core Linear Algebra Package. Tech. Rep. CS-TR-99-33, Department of Computer Sciences, The University of Texas at Austin (Nov.). Google ScholarGoogle Scholar
  29. SMITH,B.T.ET AL. 1976. Matrix Eigensystem Routines-EISPACK Guide, Second ed. Lecture Notes in Computer Science 6. Springer-Verlag, New York.Google ScholarGoogle Scholar
  30. SNIR, M., OTTO,S.W.,HUSS-LEDERMAN, S., WALKER,D.W.,AND DONGARRA, J. 1996. MPI: The Complete Reference. The MIT Press. Google ScholarGoogle Scholar
  31. STEWART, G. W. 1998. Matrix Algorithms Volume 1: Basic Decompositions. SIAM. Google ScholarGoogle Scholar
  32. VAN DE GEIJN, R. A. 1997. Using PLAPACK: Parallel Linear Algebra Package. The MIT Press. Google ScholarGoogle Scholar
  33. WHALEY,R.C.AND DONGARRA, J. J. 1998. Automatically tuned linear algebra software. In Proceedings of SC'98. Google ScholarGoogle Scholar

Index Terms

  1. FLAME: Formal Linear Algebra Methods Environment

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Mathematical Software
            ACM Transactions on Mathematical Software  Volume 27, Issue 4
            December 2001
            88 pages
            ISSN:0098-3500
            EISSN:1557-7295
            DOI:10.1145/504210
            Issue’s Table of Contents

            Copyright © 2001 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 December 2001
            Published in toms Volume 27, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader