skip to main content
10.1145/2656075.2656085acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Flattening-based mapping of imperfect loop nests for CGRAs

Authors Info & Claims
Published:12 October 2014Publication History

ABSTRACT

For loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nested loops represent an important source of parallelism. Existing solutions to mapping nested loops on CGRAs, however, are either designed for perfectly nested loops only, or expensive and inflexible. Efficient CGRA mapping of imperfect loops with arbitrary nesting depth still remains a challenge. In this paper we propose a compiler-hardware co-operative approach that is flexible and yet able to generate efficient mappings for imperfect nested loops. It is based on loop flattening, but to mitigate the negative impact of flattening we combine loop fission and a light-weight architecture extension that is designed to accelerate common operation patterns appearing frequently in flattened loops. Our experimental results using imperfect loops from multimedia and DSP domains demonstrate that our special operations can cover a large portion of nested loop operations, improve performance of nested loops by nearly 30% over using loop flattening only, and achieve near-ideal executions on CGRAs for imperfect loops.

References

  1. T. Austin, E. Larson, and D. Ernst. SimpleScalar: an infrastructure for computer system modeling. Computer, 35, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Bondalapati. Parallelizing dsp nested loops on reconfigurable architectures using data context switching. In Proc. DAC, pages 273--276, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Liang Chen and T. Mitra. Graph minor approach for application mapping on cgras. In Proc. FPT, pages 285--292, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. Grigorios Dimitroulakos, Stavros Georgiopoulos, Michalis D. Galanis, and Costas E. Goutis. Resource aware mapping on coarse grained reconfigurable arrays. Microprocessors and Microsystems, 33(2):91--105, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Friedman et al. SPR: An architecture-adaptive CGRA mapping tool. In Proc. FPGA, pages 191--200. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Ghuloum et al. Flattening and parallelizing irregular, recurrent loop nests. In Proc. PPOPP 95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hiroyuki Hamasaki et al. Soc for car navigation system with a 55.3gops image recognition engine. In Proc. ASP-DAC, pages 464--465. IEEE Press, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Hamzeh et al. Regimap: Register-aware application mapping on coarse-grained reconfigurable architectures (cgras). In Proc. DAC, pages 18:1--18:10. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Kejariwal et al. Enhanced loop coalescing: A compiler technique for transforming non-uniform iteration spaces. In ISHPC, pages 17--32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yongjoo Kim et al. Improving performance of nested loops on reconfigurable array processors. ACM Trans. Archit. Code Optim., 8(4):32:1--32:23, January 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Kuck, R. Kuhn, B. Leasure, and M. Wolfe. The structure of an advanced vectorizer for pipelined processors. In Proc. IEEE 4th International Computer Software and Applications Conference, 1980.Google ScholarGoogle Scholar
  12. C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis transformation. In Proc. CGO, pages 75--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jaedon Lee, Youngsam Shin, Won-Jong Lee, Soojung Ryu, and Jeongwook Kim. Real-time ray tracing on coarse-grained reconfigurable processor. In Field-Programmable Technology (FPT), pages 192--197, Dec 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jongeun Lee et al. Fast shared on-chip memory architecture for efficient hybrid computing with CGRAs. In Proc. DATE, March 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Liu et al. Polyhedral model based mapping optimization of loop nests for cgras. In Proc. DAC, pages 19:1--19:8. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Mei et al. Dresc: a retargetable compiler for coarse-grained reconfigurable architectures. In Proc. FPT, pages 166--173, 2002.Google ScholarGoogle Scholar
  17. A. Morvan et al. Polyhedral bubble insertion: A method to improve nested loop pipelining for high-level synthesis. IEEE Trans. CAD, 32(3):339--352, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Commun. ACM, 29(12):1184--1201, December 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hyunchul Park et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proc. PACT, pages 166--176, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Rong, Zhizhong Tang, R. Govindarajan, A. Douillet, and G. R. Gao. Single-dimension software pipelining for multi-dimensional loops. In Proc. Code Generation and Optimization, pages 163--174, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Singh et al. Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers, 49:465--481, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Thies et al. Streamit: A language for streaming applications. In R. Horspool, editor, Compiler Construction, volume 2304 of LNCS, pages 179--196. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Turkington et al. Outer loop pipelining for application specific datapaths in fpgas. IEEE Trans. VLSI, 16(10):1268--1280, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Ylvisaker et al. Macah: A "c-level" language for programming kernels on coprocessor accelerators. In Poster at Languages, Compilers and Tools for Embedded Systems (LCTES), June 2007.Google ScholarGoogle Scholar
  25. Wei Zuo et al. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proc. FPGA, pages 9--18. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Flattening-based mapping of imperfect loop nests for CGRAs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis
            October 2014
            331 pages
            ISBN:9781450330510
            DOI:10.1145/2656075

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 October 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate280of864submissions,32%

            Upcoming Conference

            ESWEEK '24
            Twentieth Embedded Systems Week
            September 29 - October 4, 2024
            Raleigh , NC , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader