skip to main content
research-article

Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion

Published:01 April 2012Publication History
Skip Abstract Section

Abstract

Techniques and algorithms for efficient in-place conversion to and from standard and blocked matrix storage formats are described. Such functionality is required by numerical libraries that use different data layouts internally. Parallel algorithms and a software package for in-place matrix storage format conversion based on in-place matrix transposition are presented and evaluated. A new algorithm for in-place transposition which efficiently determines the structure of the transposition permutation a priori is one of the key ingredients. It enables effective load balancing in a parallel environment.

References

  1. Alltop, W. O. 1975. A computer algorithm for transposing nonsquare matrices. IEEE Trans. Comput. 24, 10, 1038--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bader, M. and Zenger, C. 2006. Cache oblivious matrix multiplication using an element ordering based on a Peano curve. Linear Algebra Its Appl. 417, 2--3, 301--313.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bader, M. and Heinecke, A. 2008. Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms. In Proceedings of the Computing Frontiers Conferene and Co-Located Workshops (MAW'08 and WRFT'08). 385--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berman, M. F. 1958. A method for transposing a matrix. J. ACM 5, 4, 383--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Boothroyd, J. 1967. Algorithm 302: Transpose vector stored array. Comm. ACM 10, 5, 292--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brebner, M. A. and Laflin, S. 1970. Algorithm 380: In-situ transposition of a rectangular matrix. Comm. ACM 13, 5, 324--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brenner, N. 1973. Algorithm 467: Matrix transposition in place. Comm. ACM 16, 11, 692--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cate, E. G. and Twigg, D. W. 1977. Algorithm 513: Analysis of in-situ transposition. ACM Trans. Math. Softw. 3, 1, 104--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chan, E., van de Geijn, R., Quintana-Ort, E. S., Quintana-Ort, G., and van Zee, F. G. 2008. Programming algorithms-by-blocks for matrix computations on multithreaded architectures. Tech. rep. TR-08-04, Department of Computer Sciences, University of Texas as Austin.Google ScholarGoogle Scholar
  10. Dongarra, J. and Kurzak, J. 2009. QR factorization for the CELL Broadband Engine. Sci. Progr. 17, 1--2, 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dow, M. 1995. Transposing a matrix on a vector computer. Parall. Comput. 21, 12, 1997--2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Elmroth, E., Gustavson, F., Jonsson, I., and Kågström, B. 2004. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46, 1, 3--45.Google ScholarGoogle ScholarCross RefCross Ref
  13. Fich, F. E., Munro, J. I., and Poblete, P. V. 1995. Permuting in place. SIAM J. Comput. 24, 2, 266--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fraser, D. 1976. Array permutation by index-digit permutation. J. ACM 23, 298--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gallivan, K., Jalby, W., Meier, U., and Sameh, A. 1988. The impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2, 1, 12--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gustavson, F. 2000. New generalized matrix data structures lead to a variety of high-performance algorithms. In Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientific Software. Kluwer, B.V., 211--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gustavson, F. 2008. The relevance of new data structure approaches for dense linear algebra in the new multicore/manycore environments. Tech. rep. RC24599, IBM Research.Google ScholarGoogle Scholar
  18. Gustavson, F. and Swirszcz, T. 2007. In-place transposition of rectangular matrices. In Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing Conference. B. Kågström et al. Eds., Lecture Notes in Computer Science, vol. 4699, Springer, 560--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gustavson, F., Karlsson, L., and Kågström, B. 2009. Distributed SBP Cholesky factorization algorithms with near-optimal scheduling. ACM Trans. Math. Softw. 36, 2, 11:1--11:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gustavson, F., Henriksson, A., Jonsson, I., Kågström, B., and Ling, P. 1998. Recursive blocked data formats and BLAS's for dense linear algebra algorithms. In Proceedings of the 4th International Workshop on Applied Parallel Computing. Large Scale Scientific and Industrial Problems. Lecture Notes in Computer Science, vol. 1541, 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hong, B., Park, N., and Prasanna, V. K. 2003. Tiling, block data layout, and memory hierarchy performance. IEEE Trans. Parall. Distrib. Syst. 14, 7, 640--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Huang, C. H., Johnson, J. R., Johnson, R. W., Kaushik, S. D., and Sadayappan, P. 1993. Efficient transposition algorithms for large matrices. In Proceedings of Supercomputing'93. 656--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. IBM. 1986. IBM Engineering and scientific subroutine library. ESSL Guide and Reference, SA22-7272-00.Google ScholarGoogle Scholar
  24. Johnson, J. R. 1995. Matrix transposition. Department of Mathematics and Computer Science, Drexel University, Philadelphia, PA 19104. Manuscript. http://cricket.cs.drexel/edu/˜jjohnson/2007-08/fell/cs680/paper/transpose.pdfGoogle ScholarGoogle Scholar
  25. Karlsson, L. 2009. Blocked in-place transposition with application to storage format conversion. Tech. rep. UMINF 09.01. ISSN 0348-0542, Department of Computing Science, Umeå University, Umeå, Sweden.Google ScholarGoogle Scholar
  26. Knuth, D. E. 1971. Mathematical analysis of algorithms. In Proceedings of IFIP Congress. North-Holland, 19--27.Google ScholarGoogle Scholar
  27. Knuth, D. 1998. The Art of Computer Programming. Vols. 1 and 2, 3rd Ed. Addison-Wesley.Google ScholarGoogle Scholar
  28. Lam, M. D.,Rothberg, E. E., and Wolf, M. E. 1991. The cache performance and optimizations of blocked algorithms. ACM SIGARCH Computer Architect. News 19, 2, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pall, G. and Seiden, E. 1960. A problem in Abelian groups, with application to the transposition of a matrix on an electronic computer. Math. Comput. 14, 70, 189--192.Google ScholarGoogle ScholarCross RefCross Ref
  30. Windley, P. F. 1959. Transposing matrices in a digital computer. Comput. J. 2, 1, 47--48.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Mathematical Software
        ACM Transactions on Mathematical Software  Volume 38, Issue 3
        April 2012
        157 pages
        ISSN:0098-3500
        EISSN:1557-7295
        DOI:10.1145/2168773
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 April 2012
        • Accepted: 1 July 2011
        • Revised: 1 February 2011
        • Received: 1 February 2010
        Published in toms Volume 38, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader