A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication | IBM Journals & Magazine | IEEE Xplore