1993 | OriginalPaper | Chapter
Scalability of Sparse Direct Solvers
Author : Robert Schreiber
Published in: Graph Theory and Sparse Matrix Computation
Publisher: Springer New York
Included in: Professional Book Archive
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
We shall say that a scalable algorithm achieves efficiency that is bounded away from zero as the number of processors and the problem size increase in such a way that the size of the data structures increases linearly with the number of processors. In this paper we show that the column-oriented approach to sparse Cholesky for distributed-memory machines is not scalable. By considering message volume, node contention, and bisection width, one may obtain lower bounds on the time required for communication in a distributed algorithm. Applying this technique to distributed, column-oriented, dense Cholesky leads to the conclusion that N (the order of the matrix) must scale with P (the number of processors) so that storage grows like P2. So the algorithm is not scalable. Identical conclusions have previously been obtained by consideration of communication and computation latency on the critical path in the algorithm; these results complement and reinforce that conclusion.For the sparse case, both theory and some new experimental measurements, reported here, make the same point: for column-oriented distributed methods, the number of gridpoints (which is O(N)) must grow as P2 in order to maintain parallel efficiency bounded above zero. Our sparse matrix results employ the “fan-in” distributed scheme, implemented on machines with either a grid or a fat-tree interconnect using a subtree-to-submachine mapping of the columns.The alternative of distributing the rows and columns of the matrix to the rows and columns of a grid of processors is shown to be scalable for the dense case. Its scalability for the sparse case has been established previously [10]. To date, however, none of these methods has achieved high efficiency on a highly parallel machine.Finally, open problems and other approaches that may be more fruitful are discussed.