Abstract
Coarse-Grained Reconfigurable Arrays (CGRAs) exhibit high performance, improved flexibility, low cost, and power efficiency for various application domains. Compute-intensive loop kernels, which are perfect candidates to be executed on CGRAs, are usually mapped through modified modulo scheduling algorithms. These algorithms should be capable of performing both placement and routing. We formalize the CGRA mapping problem as a graph minor containment problem. We essentially test whether the dataflow graph representing the loop kernel is a minor of the modulo routing resource graph representing the CGRA resources and their interconnects. We design an exact graph minor testing approach that exploits the unique properties of both the dataflow graph and the routing resource graph to significantly prune the search space. We introduce additional heuristic strategies that drastically improve the compilation time while still generating optimal or near-optimal mapping solutions. Experimental evaluation confirms the efficiency of our approach.
- Shail Aditya, Vinod Kathail, and B. Ramakrishna Rau. 1998. Elcor's machine description system: Version 3.0. http://www.hpl.hp.com/techreports/98/HPL-98-128.pdf.Google Scholar
- Isolde Adler, Frederic Dorn, Fedor V. Fomin, Ignasi Sau, and Dimitrios M. Thilikos. 2012. Fast minor testing in planar graphs. Algorithmica 64, 1, 69--84. Google ScholarDigital Library
- Mythri Alle, Keshavan Varadarajan, Reddy C. Ramesh, Joseph Nimmy, Alexander Fell, Adarsha Rao, S. K. Nandy, and Ranjani Narayan. 2008. Synthesis of application accelerators on runtime reconfigurable hardware. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP'08). IEEE, 13--18. Google ScholarDigital Library
- Nikhil Bansal, Sumit Gupta, Nikil Dutt, and Alexandru Nicolau. 2003. Analysis of the performance of coarse-grain reconfigurable architectures with different processing element configurations. In Proceedings of the Workshop on Application Specific Processors, Held in Conjunction with the International Symposium on Microarchitecture (MICRO'03).Google Scholar
- Janina A. Brenner, Sandor P. Fekete, and Jan C. Van Der Veen. 2009. A minimization version of a directed subgraph homeomorphism problem. Math. Methods Oper. Res. 69, 2, 281--296.Google ScholarCross Ref
- Lakshmi N. Chakrapani, John Gyllenhaal, Wen-Mei W. Hwu, Scott A. Mahlke, Krishna V. Palem, and Rodric M. Rabbah. 2005. Trimaran: An infrastructure for research in instruction-level parallelism. In Languages and Compilers for High Performance Computing, Springer, 32--41. Google ScholarDigital Library
- Liang Chen and Tulika Mitra. 2012. Graph minor approach for application mapping on CGRAs. In Proceedings of the International Conference on Field-Programmable Technology (ICFPT'12). IEEE, 285--292.Google ScholarCross Ref
- Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). ACM Press, New York, 147--157. Google ScholarDigital Library
- Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 10, 1367--1372. Google ScholarDigital Library
- Bjorn De Sutter, Paul Coene, Tom Vander Aa, and Bingfeng Mei. 2008. Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays. In Proceedings of the ACM SIGPLAN-SIGBED Conference on Languages, Compilers and Tools for Embedded System (LCTES'08). ACM Press, New York, 151--160. Google ScholarDigital Library
- Giuseppe Di Battista, Maurizio Patrignani, and Francesco Vargiu. 1998. A split&push approach to 3D orthogonal drawing. In Graph Drawing, Springer, 87--101. Google ScholarDigital Library
- Christine Eisenbeis, Sylvain Lelait, and Bruno Marmol. 1995. The meeting graph: A new model for loop cyclic register allocation. In Proceedings of the International Federation for Information Processing Working Group. 264--267. Google ScholarDigital Library
- Steven Fortune, John Hopcroft, and James Wyllie. 1980. The directed subgraph homeomorphism problem. Theor. Comput. Sci. 10, 2, 111--121.Google ScholarCross Ref
- Stephen Friedman, Allan Carroll, Brian Van Essen, Benjamin Ylvisaker, Carl Ebeling, and Scott Hauck. 2009. SPR: An architecture-adaptive cgra mapping tool. In Proceedings of the 17th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'09). ACM Press, New York, 191--200. Google ScholarDigital Library
- Rani Gnanaolivu, Theodore S. Norvell, and Ramachandran Venkatesan. 2010. Mapping loops onto coarse-grained reconfigurable architectures using particle swarm optimization. In Proceedings of the International Conference on Soft Computing and Pattern Recognition (SoCPaR'10). IEEE, 145--151.Google ScholarCross Ref
- Rani Gnanaolivu, Theodore S. Norvell, and Ramachandran Venkatesan. 2011. Analysis of inner-loop mapping onto coarse-grained reconfigurable architectures using hybrid particle swarm optimization. Int. J. Organiz. Collect. Intell. 2, 2, 17--35. Google ScholarDigital Library
- Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. Google ScholarDigital Library
- Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2012. EPIMap: Using epimorphism to map applications on CGRAs. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). ACM Press, New York, 1284--1291. Google ScholarDigital Library
- Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2013. REGIMap: Register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference (DAC'13). ACM Press, New York, 18:1--18:10. Google ScholarDigital Library
- Akira Hatanaka and Nader Bagherzadeh. 2007. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS'07). IEEE, 1--8.Google ScholarCross Ref
- Giuseppe F. Italiano. 1986. Amortized efficiency of a path retrieval data structure. Theor. Comput. Sci. 48, 2--3, 273--281. Google ScholarDigital Library
- Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, Jonghee W. Yoon, Doosan Cho, and Yunheung Paek. 2011. High through-put data mapping for coarse-grained reconfigurable architectures. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 30, 11, 1599--1609. Google ScholarDigital Library
- Ian Kuon and Jonathan Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 26, 2, 203--215. Google ScholarDigital Library
- Zion Kwok and Steven J. E. Wilton. 2005. Register file architecture optimization in a coarse-grained reconfigurable architecture. In Proceedings of the 13th International Symposium on Field-Programmable Custom Computing Machines (FCCM'05). IEEE, 35--44. Google ScholarDigital Library
- Dongwook Lee, Manhwee Jo, Kyuseung Han, and Kiyoung Choi. 2009. FloRA: Coarse-grained reconfigurable architecture with floating-point operation capability. In Proceedings of the International Conference on Field-Programmable Technology (ICFPT'09). IEEE, 376--379.Google ScholarCross Ref
- Jong-Eun Lee, Kiyoung Choi, and Nikil D. Dutt. 2003. Compilation approach for coarse-grained reconfigurable architectures. IEEE Des. Test Comput. 20, 1, 26--33. Google ScholarDigital Library
- Giorgio Levi. 1973. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 9, 4, 341--352.Google ScholarCross Ref
- Alan Marshall, Tony Stansfield, Igor Kostarnov, Jean Vuillemin, and Brad Hutchings. 1999. A reconfigurable arithmetic array for multimedia applications. In Proceedings of the 7th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'99). ACM Press, New York, 135--143. Google ScholarDigital Library
- Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003a. ADRES: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In Proceedings of the 13th International Conference on Field Programmable Logic and Application (FPL'03). Springer, 61--70.Google ScholarCross Ref
- Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2003b. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'03). IEEE, 296--301. Google ScholarDigital Library
- Nils J. Nilsson. 1982. Principles of Artificial Intelligence. Springer. Google ScholarDigital Library
- Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke. 2006. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). ACM Press, New York, 136--146. Google ScholarDigital Library
- Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-Seok Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM Press, New York, 166--176. Google ScholarDigital Library
- B. Ramakrishna Rau. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th International Symposium on Microarchitecture (MICRO'94). ACM Press, New York, 63--74. Google ScholarDigital Library
- Neil Robertson and Paul D. Seymour. 1999. Graph minors. J. Combin. Theory B77, 1, 162--210. Google ScholarDigital Library
- Neil Robertson and Paul D. Seymour. 2004. Graph minors. XX. Wagner's conjecture. J. Combin. Theory B92, 2, 325--357. Google ScholarDigital Library
- Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J. Kurdahi, Nader Bagherzadeh, and Eliseu M. Chaves Filho. 2000. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. Comput. 49, 5, 465--481. Google ScholarDigital Library
- Mohammed A. A. Tuhin and Theodore S. Norvell. 2008. Compiling parallel applications to coarse-grained reconfigurable architectures. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE'08). IEEE, 1723--1728.Google Scholar
- Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. J. ACM 23, 1, 31--42. Google ScholarDigital Library
- Jonghee W. Yoon, Aviral Shrivastava, Sanghyun Park, Minwook Ahn, and Yunheung Paek. 2009. A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures. IEEE Trans. VLSI Syst. 17, 11, 1565--1578. Google ScholarDigital Library
- Javier Zalamea, Josep Llosa, Eduard Ayguade, and Mateo Valero. 2003. MIRS: Modulo scheduling with integrated register spilling. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. Springer, 239--253. Google ScholarDigital Library
Recommendations
Homomorphisms of triangle-free graphs without a K5-minor
In the course of extending Grotzsch's Theorem, we prove that every triangle-free graph without a K"5-minor is 3-colorable. It has been recently proved that every triangle-free planar graph admits a homomorphism to the Clebsch graph. We also extend this ...
On the Maximum Number of Cliques in a Graph
A clique is a set of pairwise adjacent vertices in a graph. We determine the maximum number of cliques in a graph for the following graph classes: (1) graphs with n vertices and m edges; (2) graphs with n vertices, m edges, and maximum degree Δ; (3) d-...
Approximating the list-chromatic number and the chromatic number in minor-closed and odd-minor-closed classes of graphs
STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of ComputingIt is well-known (Feige and Kilian [24], Håstad [39]) that approximating the chromatic number within a factor of n1-ε cannot be done in polynomial time for ε>0, unless coRP = NP. Computing the list-chromatic number is much harder than determining the ...
Comments