research-article

Improving high level synthesis optimization opportunity through polyhedral transformations

Authors:
Wei Zuo

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Yun Liang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Peng Li

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Kyle Rupnow

Advanced Digital Science Center, Singapore, Singapore

Advanced Digital Science Center, Singapore, Singapore
View Profile

,
Deming Chen

University of Illinois at Urbana-Champaign, Urbanan, IL, USA

University of Illinois at Urbana-Champaign, Urbanan, IL, USA
View Profile

,
Jason Cong

University of California, Los Angeles, Los Angeles, CA, USA

University of California, Los Angeles, Los Angeles, CA, USA
View Profile

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysFebruary 2013Pages 9–18https://doi.org/10.1145/2435264.2435271

Published:11 February 2013Publication History

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 9–18

ABSTRACT

High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.

References

Pocc. The polyhedral compiler collection. http://www.cse.ohio-state.edu/~pouchet/software/pocc/.Google Scholar
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. Int. J. Parallel Program., 29(5):493--544, 2001. Google ScholarDigital Library
Samuel Bayliss and George A. Constantinides. Optimizing SDRAM bandwidth for custom FPGA loop accelerators. In FPGA, pages 195--204, 2012. Google ScholarDigital Library
Thomas Bollaert. High-Level Synthesis: From Algorithm to Digital Circuit, chapter Catapult synthesis: A practical introduction to interactive C syntheis. Springer, 2008. Google ScholarDigital Library
Uday Bondhugula et al. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In CC, pages 132-146, 2008. Google ScholarDigital Library
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In PLDI, pages 101--113, 2008. Google ScholarDigital Library
Andrew Canis et al. Legup: high-level synthesis for FPGA-based processor/accelerator systems. In FPGA, pages 33--36, 2011. Google ScholarDigital Library
Jason Cong et al. High-level synthesis for FPGAs: From prototyping to deployment. IEEE TCAD, pages 473--491, 2011. Google ScholarDigital Library
Jason Cong, Yiping Fan, Guoling Han, Wei Jiang, and Zhiru Zhang. Behavior and communication co-optimization for systems with sequential communication media. In DAC, pages 675--678, 2006. Google ScholarDigital Library
Jason Cong, Muhuan Huang, and Yi Zou. Accelerating fluid registration algorithm on multi-FPGA platforms. In FPL, pages 50--57, 2011. Google ScholarDigital Library
Jason Cong, Vivek Sarkar, Glenn Reinman, and Alex Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15. Google ScholarDigital Library
Jason Cong, Peng Zhang, and Yi Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In DAC, pages 1233--1238, 2012. Google ScholarDigital Library
Paul Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program., 21(5):313--348, 1992. Google ScholarDigital Library
Swathi T. Gurumani et al. High-level synthesis of multiple dependent CUDA kernels on FPGA. In ASPDAC, 2013.Google Scholar
Ilya Issenin, Erik Brockmeyer, Miguel Miranda, and Nikil Dutt. DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12(2), 2007. Google ScholarDigital Library
Peng Li et al. Memory partitioning and scheduling co-optimization in behavioral synthesis. In ICCAD, pages 488--495, 2012. Google ScholarDigital Library
Yun Liang et al. High-level synthesis: Productivity, performance, and software constraints. J. Electrical and Computer Engineering, 2012. Google ScholarDigital Library
Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS, pages 228--237, 1999. Google ScholarDigital Library
Alex Papakonstantinou et al. Multilevel granularity parallelism synthesis on FPGAs. In FCCM, pages 178--185. IEEE, 2011. Google ScholarDigital Library
Louis-Noël Pouchet et al. Loop transformations: convexity, pruning and optimization. In POPL, pages 549--562, 2011. Google ScholarDigital Library
Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. Polyhedral-based data reuse optimization for configurable computing. In FPGA, 2013. Google ScholarDigital Library
Rui Rodrigues, Joao M. P. Cardoso, and Pedro C. Diniz. A data-driven approach for pipelining sequences of data-dependent loops. In FCCM, pages 219--228, 2007. Google ScholarDigital Library
Kyle Rupnow et al. High level synthesis of stereo matching: Productivity, performance, and software constraints. In FPT, pages 1--8, 2011.Google Scholar
Michael. E. Wolf and Monica. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452--471, 1991. Google ScholarDigital Library
Zhiru Zhang et al. High-Level Synthesis: From Algorithm to Digital Circuit, chapter AutoPilot: a platform-based ESL synthesis system. Springer, 2008. Google ScholarDigital Library
Heidi E. Ziegler, Mary W. Hall, and Pedro C. Diniz. Compiler-generated communication for pipelined FPGA applications. In DAC, pages 610--615, 2003. Google ScholarDigital Library

Index Terms

Improving high level synthesis optimization opportunity through polyhedral transformations
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis

Recommendations

Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Read More
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesis

The importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
Read More
Optimization of FPGA-based LDPC decoder using high-level synthesis
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

Low Density Parity Check (LDPC) codes are widely used in various communication and storage systems due to outstanding error correcting capability. In this paper, we present a Field Programmable Gate Array (FPGA) implementation of the LDPC decoder using ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
February 2013
294 pages
ISBN:9781450318877
DOI:10.1145/2435264
General Chair:
Brad Hutchings
Brigham Young University, USA
,
Program Chair:
Vaughn Betz
University of Toronto, Canada
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 February 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
high level synthesis
polyhedral
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 79
  Total Citations
  View Citations
- 1,305
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving high level synthesis optimization opportunity through polyhedral transformations

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bit-level optimization for high-level synthesis and FPGA-based acceleration

Layout-driven RTL binding techniques for high-level synthesis

Optimization of FPGA-based LDPC decoder using high-level synthesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving high level synthesis optimization opportunity through polyhedral transformations

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bit-level optimization for high-level synthesis and FPGA-based acceleration

Layout-driven RTL binding techniques for high-level synthesis

Optimization of FPGA-based LDPC decoder using high-level synthesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media