research-article

Flattening-based mapping of imperfect loop nests for CGRAs

Authors:
Jongeun Lee

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea
View Profile

,
Seongseok Seo

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea
View Profile

,
Hongsik Lee

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea
View Profile

,
Hyeon Uk Sim

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea

School of ECE, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea
View Profile

CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System SynthesisOctober 2014Article No.: 9Pages 1–10https://doi.org/10.1145/2656075.2656085

Published:12 October 2014Publication History

CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis

Pages 1–10

ABSTRACT

For loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nested loops represent an important source of parallelism. Existing solutions to mapping nested loops on CGRAs, however, are either designed for perfectly nested loops only, or expensive and inflexible. Efficient CGRA mapping of imperfect loops with arbitrary nesting depth still remains a challenge. In this paper we propose a compiler-hardware co-operative approach that is flexible and yet able to generate efficient mappings for imperfect nested loops. It is based on loop flattening, but to mitigate the negative impact of flattening we combine loop fission and a light-weight architecture extension that is designed to accelerate common operation patterns appearing frequently in flattened loops. Our experimental results using imperfect loops from multimedia and DSP domains demonstrate that our special operations can cover a large portion of nested loop operations, improve performance of nested loops by nearly 30% over using loop flattening only, and achieve near-ideal executions on CGRAs for imperfect loops.

References

T. Austin, E. Larson, and D. Ernst. SimpleScalar: an infrastructure for computer system modeling. Computer, 35, 2002. Google ScholarDigital Library
K. Bondalapati. Parallelizing dsp nested loops on reconfigurable architectures using data context switching. In Proc. DAC, pages 273--276, 2001. Google ScholarDigital Library
Liang Chen and T. Mitra. Graph minor approach for application mapping on cgras. In Proc. FPT, pages 285--292, 2012.Google ScholarCross Ref
Grigorios Dimitroulakos, Stavros Georgiopoulos, Michalis D. Galanis, and Costas E. Goutis. Resource aware mapping on coarse grained reconfigurable arrays. Microprocessors and Microsystems, 33(2):91--105, 2009. Google ScholarDigital Library
S. Friedman et al. SPR: An architecture-adaptive CGRA mapping tool. In Proc. FPGA, pages 191--200. ACM, 2009. Google ScholarDigital Library
A. Ghuloum et al. Flattening and parallelizing irregular, recurrent loop nests. In Proc. PPOPP 95, 1995. Google ScholarDigital Library
Hiroyuki Hamasaki et al. Soc for car navigation system with a 55.3gops image recognition engine. In Proc. ASP-DAC, pages 464--465. IEEE Press, 2010. Google ScholarDigital Library
M. Hamzeh et al. Regimap: Register-aware application mapping on coarse-grained reconfigurable architectures (cgras). In Proc. DAC, pages 18:1--18:10. ACM, 2013. Google ScholarDigital Library
A. Kejariwal et al. Enhanced loop coalescing: A compiler technique for transforming non-uniform iteration spaces. In ISHPC, pages 17--32, 2005. Google ScholarDigital Library
Yongjoo Kim et al. Improving performance of nested loops on reconfigurable array processors. ACM Trans. Archit. Code Optim., 8(4):32:1--32:23, January 2012. Google ScholarDigital Library
D. Kuck, R. Kuhn, B. Leasure, and M. Wolfe. The structure of an advanced vectorizer for pipelined processors. In Proc. IEEE 4th International Computer Software and Applications Conference, 1980.Google Scholar
C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis transformation. In Proc. CGO, pages 75--86, 2004. Google ScholarDigital Library
Jaedon Lee, Youngsam Shin, Won-Jong Lee, Soojung Ryu, and Jeongwook Kim. Real-time ray tracing on coarse-grained reconfigurable processor. In Field-Programmable Technology (FPT), pages 192--197, Dec 2013.Google ScholarCross Ref
Jongeun Lee et al. Fast shared on-chip memory architecture for efficient hybrid computing with CGRAs. In Proc. DATE, March 2013. Google ScholarDigital Library
D. Liu et al. Polyhedral model based mapping optimization of loop nests for cgras. In Proc. DAC, pages 19:1--19:8. ACM, 2013. Google ScholarDigital Library
B. Mei et al. Dresc: a retargetable compiler for coarse-grained reconfigurable architectures. In Proc. FPT, pages 166--173, 2002.Google Scholar
A. Morvan et al. Polyhedral bubble insertion: A method to improve nested loop pipelining for high-level synthesis. IEEE Trans. CAD, 32(3):339--352, 2013.Google ScholarDigital Library
David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Commun. ACM, 29(12):1184--1201, December 1986. Google ScholarDigital Library
Hyunchul Park et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proc. PACT, pages 166--176, 2008. Google ScholarDigital Library
H. Rong, Zhizhong Tang, R. Govindarajan, A. Douillet, and G. R. Gao. Single-dimension software pipelining for multi-dimensional loops. In Proc. Code Generation and Optimization, pages 163--174, 2004. Google ScholarDigital Library
H. Singh et al. Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers, 49:465--481, 2000. Google ScholarDigital Library
W. Thies et al. Streamit: A language for streaming applications. In R. Horspool, editor, Compiler Construction, volume 2304 of LNCS, pages 179--196. Springer, 2002. Google ScholarDigital Library
K. Turkington et al. Outer loop pipelining for application specific datapaths in fpgas. IEEE Trans. VLSI, 16(10):1268--1280, 2008. Google ScholarDigital Library
B. Ylvisaker et al. Macah: A "c-level" language for programming kernels on coprocessor accelerators. In Poster at Languages, Compilers and Tools for Embedded Systems (LCTES), June 2007.Google Scholar
Wei Zuo et al. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proc. FPGA, pages 9--18. ACM, 2013. Google ScholarDigital Library

Index Terms

Flattening-based mapping of imperfect loop nests for CGRAs
1. Hardware

Recommendations

Polyhedral model based mapping optimization of loop nests for CGRAs
DAC '13: Proceedings of the 50th Annual Design Automation Conference

The coarse-grained reconfigurable architecture (CGRA) is a promising platform that provides both high performance and high power-efficiency. The compute-intensive portions of an application (e.g. loops) are often mapped onto CGRA for acceleration. To ...
Read More
Flattening and parallelizing irregular, recurrent loop nests
PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

Irregular loop nests in which the loop bounds are determined dynamically by indexed arrays are difficult to compile into expressive parallel constructs, such as segmented scans and reductions. In this paper, we describe a suite of transformations to ...
Read More
Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

Coarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis
October 2014
331 pages
ISBN:9781450330510
DOI:10.1145/2656075
Program Chairs:
Radu Marculescu
Carnegie Mellon University
,
Gabriela Nicolescu
Polytechnique Montréal
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate280of864submissions,32%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 215
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Flattening-based mapping of imperfect loop nests for CGRAs

CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Polyhedral model based mapping optimization of loop nests for CGRAs

Flattening and parallelizing irregular, recurrent loop nests

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Flattening-based mapping of imperfect loop nests for CGRAs

CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Polyhedral model based mapping optimization of loop nests for CGRAs

Flattening and parallelizing irregular, recurrent loop nests

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media