ABSTRACT
High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.
- Pocc. The polyhedral compiler collection. http://www.cse.ohio-state.edu/~pouchet/software/pocc/.Google Scholar
- Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. Int. J. Parallel Program., 29(5):493--544, 2001. Google ScholarDigital Library
- Samuel Bayliss and George A. Constantinides. Optimizing SDRAM bandwidth for custom FPGA loop accelerators. In FPGA, pages 195--204, 2012. Google ScholarDigital Library
- Thomas Bollaert. High-Level Synthesis: From Algorithm to Digital Circuit, chapter Catapult synthesis: A practical introduction to interactive C syntheis. Springer, 2008. Google ScholarDigital Library
- Uday Bondhugula et al. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In CC, pages 132-146, 2008. Google ScholarDigital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In PLDI, pages 101--113, 2008. Google ScholarDigital Library
- Andrew Canis et al. Legup: high-level synthesis for FPGA-based processor/accelerator systems. In FPGA, pages 33--36, 2011. Google ScholarDigital Library
- Jason Cong et al. High-level synthesis for FPGAs: From prototyping to deployment. IEEE TCAD, pages 473--491, 2011. Google ScholarDigital Library
- Jason Cong, Yiping Fan, Guoling Han, Wei Jiang, and Zhiru Zhang. Behavior and communication co-optimization for systems with sequential communication media. In DAC, pages 675--678, 2006. Google ScholarDigital Library
- Jason Cong, Muhuan Huang, and Yi Zou. Accelerating fluid registration algorithm on multi-FPGA platforms. In FPL, pages 50--57, 2011. Google ScholarDigital Library
- Jason Cong, Vivek Sarkar, Glenn Reinman, and Alex Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15. Google ScholarDigital Library
- Jason Cong, Peng Zhang, and Yi Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In DAC, pages 1233--1238, 2012. Google ScholarDigital Library
- Paul Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program., 21(5):313--348, 1992. Google ScholarDigital Library
- Swathi T. Gurumani et al. High-level synthesis of multiple dependent CUDA kernels on FPGA. In ASPDAC, 2013.Google Scholar
- Ilya Issenin, Erik Brockmeyer, Miguel Miranda, and Nikil Dutt. DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12(2), 2007. Google ScholarDigital Library
- Peng Li et al. Memory partitioning and scheduling co-optimization in behavioral synthesis. In ICCAD, pages 488--495, 2012. Google ScholarDigital Library
- Yun Liang et al. High-level synthesis: Productivity, performance, and software constraints. J. Electrical and Computer Engineering, 2012. Google ScholarDigital Library
- Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS, pages 228--237, 1999. Google ScholarDigital Library
- Alex Papakonstantinou et al. Multilevel granularity parallelism synthesis on FPGAs. In FCCM, pages 178--185. IEEE, 2011. Google ScholarDigital Library
- Louis-Noël Pouchet et al. Loop transformations: convexity, pruning and optimization. In POPL, pages 549--562, 2011. Google ScholarDigital Library
- Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. Polyhedral-based data reuse optimization for configurable computing. In FPGA, 2013. Google ScholarDigital Library
- Rui Rodrigues, Joao M. P. Cardoso, and Pedro C. Diniz. A data-driven approach for pipelining sequences of data-dependent loops. In FCCM, pages 219--228, 2007. Google ScholarDigital Library
- Kyle Rupnow et al. High level synthesis of stereo matching: Productivity, performance, and software constraints. In FPT, pages 1--8, 2011.Google Scholar
- Michael. E. Wolf and Monica. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452--471, 1991. Google ScholarDigital Library
- Zhiru Zhang et al. High-Level Synthesis: From Algorithm to Digital Circuit, chapter AutoPilot: a platform-based ESL synthesis system. Springer, 2008. Google ScholarDigital Library
- Heidi E. Ziegler, Mary W. Hall, and Pedro C. Diniz. Compiler-generated communication for pipelined FPGA applications. In DAC, pages 610--615, 2003. Google ScholarDigital Library
Index Terms
- Improving high level synthesis optimization opportunity through polyhedral transformations
Recommendations
Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arraysAutomated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesisThe importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
Optimization of FPGA-based LDPC decoder using high-level synthesis
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information ProcessingLow Density Parity Check (LDPC) codes are widely used in various communication and storage systems due to outstanding error correcting capability. In this paper, we present a Field Programmable Gate Array (FPGA) implementation of the LDPC decoder using ...
Comments