Abstract
As FPGA capacity increases, a growing challenge is connecting ever-more components with the current low-level FPGA interconnect while keeping designers productive and on-chip communication efficient. We propose augmenting FPGAs with networks-on-chip (NoCs) to simplify design, and we show that this can be done while maintaining or even improving silicon efficiency. We compare the area and speed efficiency of each NoC component when implemented hard versus soft to explore the space and inform our design choices. We then build on this component-level analysis to architect hard NoCs and integrate them into the FPGA fabric; these NoCs are on average 20--23× smaller and 5--6× faster than soft NoCs. A 64-node hard NoC uses only ∼2% of an FPGA's silicon area and metallization. We introduce a new communication efficiency metric: silicon area required per realized communication bandwidth. Soft NoCs consume 4960 mm2/TBps, but hard NoCs are 84× more efficient at 59 mm2/TBps. Informed design can further reduce the area overhead of NoCs to 23 mm2/TBps, which is only 2.6× less efficient than the simplest point-to-point soft links (9 mm2/TBps). Despite this almost comparable efficiency, NoCs can switch data across the entire FPGA while point-to-point links are very limited in capability; therefore, hard NoCs are expected to improve FPGA efficiency for more complex styles of communication.
- M. S. Abdelfattah and V. Betz. 2012. Design tradeoffs for hard and soft FPGA-based networks-on-chip. In Proceedings of the International Conference on Field-Programmable Technology (FPT'12). 95--103.Google Scholar
- Altera Corp. 2007. Stratix III FPGA: Lowest power, highest performance 65-nm FPGA. http://www.altera.com/devices/fpga/stratix-fpgas/stratix-iii/st3-index.jsp.Google Scholar
- J. Balfour and W. J. Dally. 2006. Design tradeoffs for tiled cmp on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS'06). 187--198. Google ScholarDigital Library
- D. U. Becker. 2012. Efficient microarchitecture for NoC router. Ph.D. dissertation, Stanford University.Google Scholar
- D. U. Becker and W. J. Dally. 2009. Allocator implementations for network-on-chip routers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09). 1--12. Google ScholarDigital Library
- H. Bhatnagar. 2002. Advanced ASIC Chip Synthesis using Synopsys Design Compiler, Physical Compiler and Primetime. Kluwer Academic Publishers, Norwell, MA. Google ScholarDigital Library
- E. S. Chung, J. C. Hoe, and K. Mai. 2011. CoRAM: An in-fabric memory architecture for FPGA-based computing. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11). 97--106. Google ScholarDigital Library
- W. J. Dally and B. Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 684--689. Google ScholarDigital Library
- W. J. Dally and B. Towles. 2004. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Fransisco. Google ScholarDigital Library
- R. Francis and S. Moore. 2008. Exploring hard and soft networks-on-chip for FPGAs. In Proceedings of the International Conference on ICECE Technology (FPT'08). 261--264.Google Scholar
- K. Goossens, M. Bennebroek, J. Y. Hur, and M. A. Wahlah. 2008. Hardwired networks on chip in FPGAs to unify functional and configuration interconnects. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip (NOCS'08). 45--54. Google ScholarDigital Library
- R. Ho, K. W. Mai, and M. A. Horowitz. 2001. The future of wires. Proc. IEEE 89, 4, 490--504.Google ScholarCross Ref
- Y. Huan and A. Dehon. 2012. FPGA optimized packet-switched NoC using split and merge primitives. In Proceedings of the International Conference on Field-Programmable Technology (FPT'12). 47--52.Google Scholar
- M. Hutton, D. Karchmer, B. Archell, and J. Govig. 2005. Efficient static timing analysis and applications using edge masks. In Proceedings of the 13th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'05). 174--183. Google ScholarDigital Library
- I. Kuon and J. Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 26, 2, 203--215. Google ScholarDigital Library
- J. Lee and L. Shannon. 2010. Predicting the performance of application-specific NoCs implemented on FPGAs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'10). 23--32. Google ScholarDigital Library
- D. Lewis, D. Cashman, M. Chan, J. Chromczak, G. Lai, A. Lee, T. Vanderhoek, and H. Yu. 2013. Architectural enhancements in Stratix v. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'13). 147--156. Google ScholarDigital Library
- M. K. Papamichael and J. C. Hoe. 2012. CONNECT: Re-examining conventional wisdom for designing NoCs in the context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'12). 37--46. Google ScholarDigital Library
- G. Passas, M. Katevenis, and D. Pnevmatikatos. 2012. Crossbar NoCs are scalable beyond 100 nodes. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 31, 4, 573--585. Google ScholarDigital Library
- G. Schelle and D. Grunwald. 2008. Exploring FPGA network on chip implementations across various application and network loads. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'08). 41--46.Google Scholar
- R. Scoville. 2010. TimeQuest User Guide. Wiki Release.Google Scholar
- B. Sethuraman, P. Bhattacharya, J. Khan, and R. Vemuri. 2005. LiPaR: A light-weight parallel router for FPGA-based networks-on-chip. In Proceedings of the 15th ACM Great Lakes Symposium on VLSI (GLSVLSI'05). 452--457. Google ScholarDigital Library
- Synopsys. 2010. Design compiler optimization reference manual. http://cleroux.vvv.enseirb-matmeca.fr/EN216/doc/dcrmo.pdf.Google Scholar
- Y. Tamir and G. L. Frazier. 1988. High-performance multi-queue buffers for VLSI communication switches. In Proceedings of the 15th Annual International Symposium on Computer Architecture (ISCA'88). 343--354. Google ScholarDigital Library
- L. G. Valiant and G. J. Brebner. 1981. Universal schemes for parallel communication. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC'81). 263--277. Google ScholarDigital Library
- H. Wong, V. Betz, and J. Rose. 2011. Comparing FPGA vs. custom CMOs and the impact on processor microarchitecture. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11). 5--14. Google ScholarDigital Library
Index Terms
- Networks-on-Chip for FPGAs: Hard, Soft or Mixed?
Recommendations
An improved transmission scheme for error-prone inter-chip network-on-chip communication links implemented on FPGAs
FPGAworld '13: Proceedings of the 10th FPGAworld ConferenceNetwork-on-Chip (NoC) is an alternative to traditional busses for faster interconnect mechanism. The aim is to have infinite scalability, and this implies the possibility to extend the on-chip NoC communication protocol off-chip. To gain wholesome ...
On the area and energy scalability of wireless network-on-chip: a model-based benchmarked design space exploration
Networks-on-chip (NoCs) are emerging as the way to interconnect the processing cores and the memory within a chip multiprocessor. As recent years have seen a significant increase in the number of cores per chip, it is crucial to guarantee the ...
Flexible Reconfigurable On-chip Networks for Multi-core SoCs
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable TechnologiesMulti and many-core embedded SoCs (System-on-Chip) provide key solutions to meet the extraordinary demands of current and future applications. This fact becomes critical when the chip design dives to the limitation of sub-nanometer technologies that ...
Comments