research-article

Public Access

Analytical Modeling Is Enough for High-Performance BLIS

Authors:
Tze Meng Low

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Francisco D. Igual

Universidad Complutense de Madrid, Madrid, Spain

Universidad Complutense de Madrid, Madrid, Spain
View Profile

,
Tyler M. Smith

The University of Texas at Austin, Austin, TX

The University of Texas at Austin, Austin, TX
View Profile

,
Enrique S. Quintana-Orti

Universidad Jaume I, Castellón, Spain

Universidad Jaume I, Castellón, Spain
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 43 Issue 2Article No.: 12pp 1–18https://doi.org/10.1145/2925987

Published:16 August 2016Publication History

ACM Transactions on Mathematical Software

Abstract

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning parameters for high-end instantiations of the matrix-matrix multiplication. This is of both practical and scientific importance, as it greatly reduces the development effort required for the implementation of the level-3 BLAS while also advancing our understanding of how hierarchically layered memories interact with high-performance software. This allows the community to move on from valuable engineering solutions (empirically autotuning) to scientific understanding (analytical insight).

References

AMD. 2015. AMD Core Math Library. (2015). http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/.Google Scholar
Edward Anderson, Zhaojun Bai, L. Susan Blackford, James Demmesl, Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, Anne Greenbaum, Alan McKenney, and Danny C. Sorensen. 1999. LAPACK Users' Guide (3rd ed.). SIAM. Google ScholarDigital Library
Jeff Bilmes, Krste Asanović, Chee whye Chin, and Jim Demmel. 1997. Optimizing matrix multiply using PHiPAC: A Portable, high-performance, ANSI c coding methodology. In Proceedings of the International Conference on Supercomputing. Vienna, Austria. Google ScholarDigital Library
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft. 16, 1 (March 1990), 1--17. Google ScholarDigital Library
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. 1988. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Soft. 14, 1 (March 1988), 1--17. Google ScholarDigital Library
Kazushige Goto and Robert van de Geijn. 2008a. High performance implementation of the level-3 BLAS. ACM Trans. Math. Software 35, 1 (July 2008), 4:1--4:14. http://doi.acm.org/10.1145/1377603. 1377607 Google ScholarDigital Library
Kazushige Goto and Robert A. van de Geijn. 2008b. Anatomy of a high-performance matrix multiplication. ACM Trans. Math. Software 34, 3 (May 2008), 12:1--12:25. http://doi.acm.org/10.1145/1356052.1356053 Google ScholarDigital Library
John L. Hennessy and David A. Patterson. 2003. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Pub., San Francisco. Google ScholarDigital Library
Greg Henry. 1992. BLAS Based on Block Data Structures. Theory Center Technical Report CTC92TR89. Advanced Computing Research Institute. Cornell University. Google ScholarDigital Library
IBM. 2015. Engineering and Scientific Subroutine Library. (2015). http://www-03.ibm.com/systems/power/software/essl/.Google Scholar
Intel. 2015. Math Kernel Library. (2015). https://software.intel.com/en-us/intel-mkl.Google Scholar
Vasilios Kelefouras, Angeliki Kritikakou, and Costas Goutis. 2014. A matrix-matrix multiplication methodology for single/multi-core architectures using SIMD. J, Supercomput, (2014), 1--23. DOI:http://dx.doi.org/10.1007/s11227-014-1098-9 Google ScholarDigital Library
Charles L. Lawson, Richard J. Hanson, David R. Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Software 5, 3 (Sept. 1979), 308--323. Google ScholarDigital Library
OpenBLAS 2015. http://www.openblas.net. (2015).Google Scholar
Ardavan Pedram, Andreas Gerstlauer, and Robert A. van de Geijn. 2012a. On the efficiency of register file versus broadcast interconnect for collective communications in data-parallel hardware accelerators. In 2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). Google ScholarDigital Library
Ardavan Pedram, Robert A. van de Geijn, and Andreas Gerstlauer. 2012b. Codesign tradeoffs for high-performance, low-power linear algebra architectures. IEEE Trans. Comput. 61 (Dec. 2012), 1724--1736. DOI:http://dx.doi.org/10.1109/TC.2012.132 Google ScholarDigital Library
Tyler M. Smith, Robert van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee. 2014. Anatomy of high-Performance many-threaded matrix multiplication. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS'14). IEEE Computer Society, Washington, DC, USA, 1049--1059. DOI:http://dx.doi.org/10.1109/IPDPS.2014.110 Google ScholarDigital Library
Field G. Van Zee, Tyler Smith, Bryan Marker, Tze Meng Low, Robert A. van de Geijn, Francisco D. Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler, Vernon Austel, John Gunnels, and Lee Killough. 2014. The BLIS framework: Experiments in portability. ACM Trans. Math. Software (2014). In review. Google ScholarDigital Library
Field G. Van Zee and Robert A. van de Geijn. 2015. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41, 3, Article 14 (June 2015), 33 pages. DOI:http://dx.doi.org/10.1145/2764454 Google ScholarDigital Library
Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. 2013. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13). ACM, Article 25, 12 pages. DOI:http://dx.doi.org/10.1145/2503210.2503219 Google ScholarDigital Library
R. Clint Whaley and Jack J. Dongarra. 1998. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (SC'98). Google ScholarDigital Library
R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. 2001. Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27, 1--2 (2001), 3--35.Google ScholarCross Ref
Kamen Yotov, Xiaoming Li, María Jesús Garzarán, David Padua, Keshav Pingali, and Paul Stodghill. 2005. Is search really necessary to generate high-performance BLAS? Proc. IEEE, special issue on “Program Generation, Optimization, and Adaptation” 93, 2 (2005).Google ScholarCross Ref

Index Terms

Analytical Modeling Is Enough for High-Performance BLIS
1. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance

Recommendations

The BLIS Framework: Experiments in Portability

BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The ...
Read More
BLIS: A Framework for Rapidly Instantiating BLAS Functionality

The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-...
Read More
High-performance up-and-downdating via householder-like transformations

We present high-performance algorithms for up-and-downdating a Cholesky factor or QR factorization. The method uses Householder-like transformations, sometimes called hyperbolic Householder transformations, that are accumulated so that most computation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Mathematical Software Volume 43, Issue 2
June 2017
200 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/2988256
Editor:
Michael A. Heroux
Sandia National Laboratories, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 August 2016
- Revised: 1 April 2016
- Accepted: 1 April 2016
- Received: 1 February 2015
Published in toms Volume 43, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Linear algebra
analytical modeling
high performance
libraries
matrix multiplication
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 2,128
  Total Downloads
- Downloads (Last 12 months)514
- Downloads (Last 6 weeks)73
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analytical Modeling Is Enough for High-Performance BLIS

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

The BLIS Framework: Experiments in Portability

BLIS: A Framework for Rapidly Instantiating BLAS Functionality

High-performance up-and-downdating via householder-like transformations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Analytical Modeling Is Enough for High-Performance BLIS

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

The BLIS Framework: Experiments in Portability

BLIS: A Framework for Rapidly Instantiating BLAS Functionality

High-performance up-and-downdating via householder-like transformations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media