Article

Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

Authors:
Hyunchul Park

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Kevin Fan

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Manjunath Kudlur

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Scott Mahlke

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsOctober 2006Pages 136–146https://doi.org/10.1145/1176760.1176778

Published:22 October 2006Publication History

CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

Pages 136–146

ABSTRACT

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost and energy efficiency. CGRAs consist of an array of function units and register files generally organized as a two dimensional grid. The most difficult challenge with deploying CGRAs is compiler scheduling technology that can map software implementations of compute intensive loops onto the array. Traditional schedulers are not suitable because they do not take into account the explicit routing of operand values that is necessary. In essence, the problem of binding operations to time slots and resources is extended to also include explicit routing of operands from producers to consumers. To tackle this problem, this paper introduces a software pipelining technique for mapping loop bodies onto CGRAs, referred to as modulo graph embedding. We leverage graph embedding from graph theory, which is used to draw graphs onto a target space. The loop body is essentially drawn onto the CGRA mesh, subject to modulo resource usage constraints. Modulo graph embedding is effective because it can take into account the communication structure of the loop body during mapping. On average, a compute utilization of 56-68% is achieved for a set of loop kernels across three 4x4 CGRA designs.

References

T. Callahan, J. Hauser, and J. Wawrzynek. The Garp architecture and C compiler. IEEE Computer, 33(4):62--69, Apr. 2000. Google ScholarDigital Library
R. Davidson and D. Harel. Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics, 15(4):301--331, 1996. Google ScholarDigital Library
P. Eades. A heuristic for graph drawing. Congressus Numerantium, 42:149--160, 1984.Google Scholar
C. Ebeling et al. Mapping applications to the RaPiD configurable architecture. In Proc. of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines, pages 106--115, Apr. 1997. Google ScholarDigital Library
J. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, MA, 1985. Google ScholarDigital Library
S. Goldstein et al. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. of the 26th Annual International Symposium on Computer Architecture, pages 28--39, June 1999. Google ScholarDigital Library
T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs. Information Processing Letters, 31:7--15, 1989. Google ScholarDigital Library
G. Krishnamurthy, E. Granston, and E. Stotzer. Affinity-based cluster assignment for unrolled loops. In Proc. of the 2002 International Conference on Supercomputing, pages 107--116, June 2002. Google ScholarDigital Library
J. Lee, K. Choi, and N. Dutt. Compilation approach for coarse-grained reconfigurable architectures. IEEE Design & Test of Computers, 20(1):26--33, Jan. 2003. Google ScholarDigital Library
W. Lee et al. Space-time scheduling of instruction-level parallelism on a RAW machine. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 46--57, Oct. 1998. Google ScholarDigital Library
W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent scheduling. In Proc. of the 35th Annual International Symposium on Microarchitecture, pages 111--122, 2002. Google ScholarDigital Library
W. Li and H. Kurata. A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics, 21(9):2036--2042, 2005. Google ScholarDigital Library
G. Lu, H. Singh, M. -H. Lee, N. Bagherzadeh, F. J. Kurdahi, and E. M. C. Filho. The MorphoSys parallel reconfigurable system. In Proc. of the 5th International Euro-Par Conference, pages 727--734, 1999. Google ScholarDigital Library
B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proc. of the 2003 Design, Automation and Test in Europe, pages 296--301, 2003. Google ScholarDigital Library
E. Nystrom and A. E. Eichenberger. Effective cluster assignment for modulo scheduling. In Proc. of the 31st Annual International Symposium on Microarchitecture, pages 103--114, Dec. 1998. Google ScholarDigital Library
B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 63--74, Nov. 1994. Google ScholarDigital Library
J. Sanchez and A. Gonzalez. Modulo scheduling for a fully-distributed clustered VLIW architecture. In Proc. of the 33rd Annual International Symposium on Microarchitecture, pages 124--133, Dec. 2000. Google ScholarDigital Library
M. B. Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro, 22(2):25--35, 2002. Google ScholarDigital Library
G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, and W. Bohm. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. pages 116--125, 2001. Google ScholarDigital Library

Index Terms

Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register ...
Read More
Modulo scheduling for highly customized datapaths to increase hardware reusability
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization

In the embedded domain, custom hardware in the form of ASICs is often used to implement critical parts of applications when performance and energy efficiency goals cannot be met with software implementations on a general purpose processor or DSP. The ...
Read More
CGRA express: accelerating execution using dynamic operation fusion
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
October 2006
448 pages
ISBN:1595935436
DOI:10.1145/1176760
General Chairs:
Seongsoo Hong
Seoul National University
,
Wayne Wolf
Princeton University
,
Program Chairs:
Krisztian Flautner
ARM Ltd.
,
Taewhan Kim
Seoul National University
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coarse-grained reconfigurable architecture
graph embedding
modulo scheduling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 61
  Total Citations
  View Citations
- 691
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Modulo scheduling for highly customized datapaths to increase hardware reusability

CGRA express: accelerating execution using dynamic operation fusion