Abstract
The AllReduce algorithm is a promising new algorithm for parallelizing the Householder QR decomposition A = QR of a tall and skinny matrix. It divides the input matrix A vertically in a recursive manner, computes the QR decompositions of each submatrix independently, and merges the results to obtain the QR decomposition of A. While this algorithm has been shown to achieve excellent speedup in various parallel environments, its rounding error properties have not been elucidated yet. In this paper, we present theoretical error analysis of the AllReduce algorithm. Specifically, we derive bounds for the backward error of A and deviation from orthogonality of the computed Q factor. Our analysis shows that both of these bounds are smaller than their counterparts for the conventional Householder QR algorithm. Moreover, the bounds decrease as the number of submatrices increases. These results are supported by numerical experiments. Thus we can conclude that the AllReduce algorithm can be used as a reliable method of orthogonalization in parallel environments.
Similar content being viewed by others
References
Bai Z., Day D., Ye Q.: ABLE: an adaptive Block Lanczos method for non-Hermitian eigenvalue problems. SIAM J. Matrix Anal. Appl. 20, 1060–1082 (1999)
Bai, Z., Demmel, J.W., Dongarra, J.J., Ruhe, A., van der Vorst, H. (eds.): Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia (2000)
Bischof C., Lang B.: Algorithm 807: the SBR Toolbox—software for successive band reduction. ACM Trans. Math. Softw. 26(4), 602–616 (2000)
Demmel, J.W., Grigori, L., Hoemmen, M.F., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. LAPACK Working Notes, No. 204 (2008)
Freund R.W., Malhotra M.: A Block QMR algorithm for non-Hermitian linear systems with multiple right-hand sides. Linear Algebra Appl. 254, 119–157 (1997)
Golub G.H., Van Loan C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Higham N.J.: Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia (1996)
Langou, J.: AllReduce Algorithms: Application to Householder QR Factorization. Presentation at Preconditioning 2007, Toulouse, France, July 9–12, 2007. http://www.precond07.enseeiht.fr/Talks/langou/langou.pdf
Mori, D., Yamamoto, Y., Zhang, S.-L.: Performance and accuracy of the AllReduce algorithm for the householder QR factorization (in Japanese). IPSJ SIG Technical Report, 2008-HPC-117, pp. 25–29 (2008)
Murakami, H.: A multi-staged Block algorithm of householder-type orthogonal transformation for distributed parallel processing (in Japanese). IPSJ SIG Technical Report, 2008-HPC-117, pp. 19–24 (2008)
O’Leary D.P.: The Block conjugate gradient algorithm and related methods. Linear Algebra Appl. 29, 293–322 (1980)
Toledo S., Rabani E.: Very large electronic structure calculations using an out-of-core filter diagonalization method. J. Comput. Phys. 180(1), 256–269 (2002)
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Mori, D., Yamamoto, Y. & Zhang, SL. Backward error analysis of the AllReduce algorithm for householder QR decomposition. Japan J. Indust. Appl. Math. 29, 111–130 (2012). https://doi.org/10.1007/s13160-011-0053-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160-011-0053-x