Top

2011 | Book

Low Power Networks-on-Chip

Editors: Cristina Silvano, Marcello Lajolo, Gianluca Palermo

Publisher: Springer US

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

In recent years, both Networks-on-Chip, as an architectural solution for high-speed interconnect, and power consumption, as a key design constraint, have continued to gain interest in the design and research communities. This book offers a single-source reference to some of the most important design techniques proposed in the context of low-power design for networks-on-chip architectures.

Frontmatter

Low-Level Design Techniques

Frontmatter

Chapter 1. Hybrid Circuit/Packet Switched Network for Energy Efficient on-Chip Interconnections

Abstract

Network on-Chip (NoC) is an interconnect fabric to connect sub-system blocks on a chip. The NoC should provide high bandwidth and low latency, should consume low energy, and should be compact. However, all these requirements are at odds and require tradeoffs at all levels. In this chapter, we discuss issues and challenges for future NoCs with demands for high bandwidth and low energy. Next, we present details of how coupling packet-switched arbitration with circuit-switched data transfer can achieve these goals. In this hybrid network, packet-switched arbitration is used to reserve future circuit-switched channels for the data transfer, eliminating the performance bottlenecks associated with pure circuit-switched networks while maintaining their power advantage. Furthermore, proximity-based data streaming increases network throughput and improves energy efficiency. Measurements of this NoC in 45 nm CMOS are described to analyze de-sign tradeoffs.

Mark A. Anders, Himanshu Kaul, Ram K. Krishnamurthy, Shekhar Y. Borkar

Chapter 2. Run-Time Power-Gating Techniques for Low-Power On-Chip Networks

Abstract

Leakage power has already been consuming a considerable portion of the active power in recent process technologies. In this chapter, we survey various power gating techniques to reduce the leakage power of on-chip routers. Then we introduce a run-time fine-grained power-gating router, in which power supply to each router component (e.g., virtual-channel buffer, crossbar’s multiplexer, and output latch) can be individually controlled in response to the applied workload. The fine-grained power gating router with 35 micro-power domains is designed using a commercial 65 nm process and evaluated in terms of the area overhead, application performance, and leakage power reduction.

Hiroki Matsutani, Michihiro Koibuchi, Hiroshi Nakamura, Hideharu Amano

Chapter 3. Adaptive Voltage Control for Energy-Efficient NoC Links

Abstract

As we enter the many-core integration era driven by advances in multiprocessor system-on-chip innovations, interconnect emerges as the bottleneck in achieving energy efficiency in systems-on-chip. This chapter surveys the state-of-the-art in energy-efficient communication link design for NoCs. After reviewing techniques at the datalink and physical abstraction layers, we introduce a lookahead-based transition-aware adaptive voltage control method for achieving improved energy-efficiency at moderate cost in performance and reliability. Limitations of this method are evaluated and future prospects in energy-efficient link design are projected.

Paul Ampadu, Bo Fu, David Wolpert, Qiaoyan Yu

Chapter 4. Asynchronous Communications for NoCs

Abstract

Technology scaling beyond 90 nm drastically complicates the chip design process. Greater demand for higher performance and more functionality placed on a single chip, while maintaining power consumption at a reasonable level drives research towards new architectural and communication paradigms that support topological scaling. Network-on-Chip is seen as one such paradigm, but its inherently massive parallelism and distribution of switching activity naturally lead to a much wider spectrum of techniques used for system timing. The use of global clocking becomes very difficult for improving power and performance while at the same time keeping acceptable levels of robustness to faults, both fabrication and run time, as well as to the increasing parametric variability of components.

Systems based on NoCs are thus becoming more diverse in terms of timing, and if not fully asynchronous, then mixed, e.g. globally asynchronous and locally synchronous. The notion of timing and synchronization is pervasive in system communication architectures and affects all layers of hierarchy, but its biggest effect is probably at the link layer, where the notion of data validity in communication channels between processing nodes and network routers is paramount. This chapter provides an overview of the various asynchronous techniques that are used in such links, including signalling schemes, data encoding and synchronization solutions. Those are discussed with a view of comparison in terms of area, power and performance. The fundamental issues of the formation of data tokens based on the principles of data validity, acknowledgement, delay-insensitivity, timing assumptions and soft-error tolerance are considered. The chapter also covers some of the aspects related to combining asynchronous communication links to form parts of the entire network architecture, which involves asynchronous logic for arbitration and routing hardware. To this end, we also present basic techniques for building small-scale controllers using the formal models of petri nets and signal transition graphs.

Stanislavs Golubcovs, Alex Yakovlev

System-Level Design Techniques

Frontmatter

Chapter 5. Application-Specific Routing Algorithms for Low Power Network on Chip Design

Abstract

In the last few years, Network-on-Chip (NoC) has emerged as a dominant paradigm for synthesis of a multi-core Systems-on-Chip (SoC). A future NoC architecture must be general enough to allow volume production and must have features for specialization and configuration to match and meet the application’s power and performance requirements. This chapter describes how one important aspect, namely the routing algorithm, can be optimized in such NoC platforms. Routing algorithm has a major effect on the performance (packet latency and throughput) as well as power consumption of NoC. A methodology to develop efficient and deadlock free routing algorithms which are specialized for an application or a set of concurrent applications is presented. The methodology, called application-specific routing algorithms (APSRA), exploits the application-specific information regarding pairs of cores which communicate and other pairs which never communicate in the NoC platform. This information is used to maximize the adaptivity of the routing algorithm without compromising the important property of deadlock freedom. The chapter also presents an extensive comparison between the routing algorithms generated using APSRA methodology and general purpose deadlock-free routing algorithms. The simulation-based evaluations are performed using both synthetic traffic and traffic from real applications. The comparison embraces several performance indices such as degree of adaptiveness, average delay, throughput, power dissipation, and energy consumption. In spite of an adverse impact on router architecture, it is shown that the higher adaptivity of APSRA leads to significant improvements in both routing performance and energy consumption.

Maurizio Palesi, Rickard Holsmark, Shashi Kumar, Vincenzo Catania

Chapter 6. Adaptive Data Compression for Low-Power On-Chip Networks

Abstract

With the recent design shift toward increasing the number of processing elements in a chip, supports for low power, low latency, and high bandwidth in on-chip interconnect are essential. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect power/area cost. In this chapter, we present a method to exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one saves power consumption by reducing required operations in network components and decreases contention by increasing the effective bandwidth of shared resources. The main challenges are providing a scalable implementation of tables and minimizing the latency overhead of compression. We propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. This scheme eliminates table size dependence on a network size, which realizes scalability and reduces overhead cost of table for compression. Our simulation results are presented for 8-core and 16-core designs. Overall, our compression method improves the packet latency up to 44% with an average of 36% and reduces the network power consumption by 36% on average in 16-core tiled design.

Yuho Jin, Ki Hwan Yum, Eun Jung Kim

Chapter 7. Latency-Constrained, Power-Optimized NoC Design for a 4G SoC: A Case Study

Abstract

In this chapter, we examine the design process of a network on-chip (NoC) for a high-end commercial system on-chip (SoC) application. We present several design choices and focus on the power optimization of the NoC while achieving the required performance. Our design steps include module mapping and allocation of customized capacities to links. Unlike previous studies, in which point-to-point, per-flow timing constraints were used, we demonstrate the importance of using the application end-to-end traversal latency requirements during the optimization process. In order to evaluate the different alternatives, we report the synthesis results of a design that meets the actual throughput and timing requirements of the commercial SoC. According to our findings, the proposed technique offers up to 40% savings in the total router area, 49% savings in the inter-router wiring area, and a 16% reduction of total power for our target router architecture.

Rudy Beraha, Isask’har Walter, Israel Cidon, Avinoam Kolodny

Future and Emerging Technologies

Frontmatter

Chapter 8. Design and Analysis of NoCs for Low-Power 2D and 3D SoCs

Abstract

Networks-on-Chip (NoC), being a system-level interconnect, can play a major role in achieving low-power SoC designs. In many designs, the cores are grouped in to Voltage Islands (VIs). To reduce the leakage power consumption, an island containing cores that are not used in an application can be shutdown, while the other islands can still be operational. When one or more of the islands are shutdown, the interconnect should allow the communication between islands that are operational. For this, the NoCs has to be designed efficiently to allow shutdown of VIs, thereby reducing the leakage power consumption. In this chapter, we present methods to design NoC topologies that provide such a support for both 2D and 3D ICs. We show how the concept of VIs need to be considered during topology synthesis phase itself. We also make studies to show the benefits of migrating to 3D-stacked chips for realistic applications that have multiple VIs.

Ciprian Seiculescu, Srinivasan Murali, Luca Benini, Giovanni De Micheli

Chapter 9. CMOS Nanophotonics: Technology, System Implications, and a CMP Case Study

Abstract

The latency, bandwidth, and power consumption of on-chip interconnection networks are central concerns in the design of multi- and many-core microprocessors. When the global network-on-chip (NoC) is electrical, the power consumption and the limited connectivity caused by difficulties associated with global wires will limit network performance due to power or topology constraints unless applications can be written, which only require nearest neighbor communication. This is a highly unlikely scenario, and these performance and power barriers will become more severe with shrinking process technology and increased core counts. Emerging CMOS nanophotonic technologies provide a compelling alternative to traditional all-electronic NoCs. Wire power consumption is fundamentally linear with wire length. Due to the low loss nature of waveguides, optical data transmission primarily consumes energy at the endpoints where optical to electrical (OE) and electrical to optical (EO) conversion takes place. Therefore, the energy required to transport data is relatively independent of path length for path lengths of interest for NoC-based systems. Additionally, the use of wave division multiplexing can be exploited to improve per lane bandwidth. Signal integrity limitations make this option intractable for electrical NoCs. The result is that nanophotonic NoCs can provide both higher throughput and lower power consumption than all-electrical NoCs. This chapter introduces CMOS nanophotonic technology and considers its use in photonic chip-wide networks enabling many-core microprocessors with greatly enhanced performance and flexibility while consuming less power than their electrical counterparts. It provides, as a case study, a design that takes advantage of CMOS nanophotonics to achieve ten-teraop performance in a 256-core 3D chip stack, using optically connected main memory, very high memory bandwidth, cache coherence across all cores, no bisection bandwidth limits on communication, and cross-chip communication at very low latency with cache-line granularity.

Jung Ho Ahn, Raymond G. Beausoleil, Nathan Binkert, Al Davis, Marco Fiorentino, Norman P. Jouppi, Moray McLaren, Matteo Monchiero, Naveen Muralimanohar, Robert Schreiber, Dana Vantrease

Chapter 10. RF-Interconnect for Future Network-On-Chip

Abstract

In the era of the nanometer CMOS technology, due to stringent system requirements in power and performance, microprocessor manufacturers are relying more on chip multi-processor (CMP) designs. CMPs partition silicon real estate among a number of processor cores and on-chip caches, and these components are connected via an on-chip interconnection network (Network-on-chip). It is projected that communication via NoC is one of the primary limiters to both performance and power consumption. To mitigate such problems, we explore the use of multiband RF-interconnect (RF-I) which can communicate simultaneously through multiple frequency bands with low power signal transmission and reconfigurable bandwidth. At the same time, we investigate the CMOS mixed-signal circuit implementation challenges for improving the RF-I signaling integrity and efficiency. Furthermore, we propose a micro-architectural framework that can be used to facilitate the exploration of scalable low power NoC architectures based on physical planning and prototyping.

Sai-Wang Tam, Eran Socher, Mau-Chung Frank Chang, Jason Cong, Glenn D. Reinman

Backmatter

Title: Low Power Networks-on-Chip
Editors: Cristina Silvano
Marcello Lajolo
Gianluca Palermo
Publisher: Springer US
Electronic ISBN: 978-1-4419-6911-8
Print ISBN: 978-1-4419-6910-1
DOI: https://doi.org/10.1007/978-1-4419-6911-8

Springer Professional

About this book

Table of Contents

Frontmatter

Low-Level Design Techniques

Frontmatter

Chapter 1. Hybrid Circuit/Packet Switched Network for Energy Efficient on-Chip Interconnections

Chapter 2. Run-Time Power-Gating Techniques for Low-Power On-Chip Networks

Chapter 3. Adaptive Voltage Control for Energy-Efficient NoC Links

Chapter 4. Asynchronous Communications for NoCs

System-Level Design Techniques

Frontmatter

Chapter 5. Application-Specific Routing Algorithms for Low Power Network on Chip Design

Chapter 6. Adaptive Data Compression for Low-Power On-Chip Networks

Chapter 7. Latency-Constrained, Power-Optimized NoC Design for a 4G SoC: A Case Study

Future and Emerging Technologies

Frontmatter

Chapter 8. Design and Analysis of NoCs for Low-Power 2D and 3D SoCs

Chapter 9. CMOS Nanophotonics: Technology, System Implications, and a CMP Case Study

Chapter 10. RF-Interconnect for Future Network-On-Chip

Backmatter