Skip to main content

2007 | Buch

Designing Embedded Processors

A Low Power Perspective

herausgegeben von: Jörg Henkel, Sri Parameswaran

Verlag: Springer Netherlands

insite
SUCHEN

Über dieses Buch

As we embrace the world of personal, portable, and perplexingly complex digital systems, it has befallen upon the bewildered designer to take advantage of the available transistors to produce a system which is small, fast, cheap and correct, yet possesses increased functionality.

Increasingly, these systems have to consume little energy. Designers are increasingly turning towards small processors, which are low power, and customize these processors both in software and hardware to achieve their objectives of a low power system, which is verified, and has short design turnaround times.

Designing Embedded Processors examines the many ways in which processor based systems are designed to allow low power devices. It looks at processor design methods, memory optimization, dynamic voltage scaling methods, compiler methods, and multi processor methods. Each section has an introductory chapter to give a breadth view, and have a few specialist chapters in the area to give a deeper perspective. The book provides a good starting point to engineers in the area, and to research students embarking upon the exciting area of embedded systems and architectures.

Inhaltsverzeichnis

Frontmatter

Application Specific Embedded Processors

Frontmatter
Chapter 1. Application-Specific Embedded Processors
Today’s silicon technology allows building embedded processors as part of SoCs (systems-on-chip) comprising upto a billion of transistors on a single die. Interestingly, real-world SOCs (in large quantities for mainstream applications) utilizing this potential complexity do hardly exist. Another observation is that the semiconductor industry a couple of years ago experienced an inflection point: the number of ASIC (Application Specific Integrated Circuits) design starts was outpaced by the number of design starts for Application Specific Standard Products (ASSPs). Moreover, we might face a new design productivity gap: the “gap of complexity” (details and references will follow later). Together these observations are reflecting a transition in the way embedded processors are designed. This article reports on and analyzes current and possible future trends from a perspective of embedded system design with an emphasis on design environments for so-called extensible processor platforms. It describes the state-of-the-art in the three main steps when designing an extensible processor, namely Code Segment Identification, Extensible Instruction Generation, Architectural Customization Selection.
Jörg Henkel, Sri Parameswaran, Newton Cheung
Chapter 2. Low-Power Design with NISC Technology
The power consumption of an embedded application can be reduced by moving as much computation as possible from runtime to compile time, and by customizing the microprocessor architecture to minimize number of cycles. This chapter introduces a new generation of processors, called No-Instruction-Set- Computer (NISC), that gives the full control of the datapath to the compiler, in order to simplify the controller hardware, and enable fast architecture customizations.
Bita Gorjiara, Mehrdad Reshadi, Daniel Gajski
Chapter 3. Synthesis of Instruction Sets for High-Performance and Energy-Efficient ASIP
Several techniques have been proposed to reduce the energy consumption of ASIPs (Application-Specific Instruction set Processors). While those techniques can reduce the energy consumption with minimal change in the instruction set (IS), they often fail to exploit the opportunity of designing the entire IS from the energy-efficiency perspective. In this chapter we present an energy-efficient IS synthesis that can comprehensively reduce the energy-delay product (EDP) of ASIPs through optimal instruction encoding, considering both the instruction bitwidth and the dynamic instruction fetch count. Experimental results with a typical embedded RISC processor show that the proposed energy-efficient IS synthesis technique can generate application-specific ISs that are up to 40% more energy-efficient over the native IS for several application benchmarks.
Jong-Eun Lee, Kiyoung Choi, Nikil D. Dutt
Chapter 4. A Framework for Extensible Processor Based MPSoC Design
Multiprocessor system-on-chip (MPSoC) architectures have emerged as a popular solution to the ever-increasing performance requirements of embedded systems. MPSoC architectures that are customized to a specific application or domain have the potential to achieve very high performance, while also requiring low power consumption. The recent emergence of extensible processors has greatly facilitated the design of efficient yet flexible application-specific processors, making them a promising building block for MPSoC architectures. However, the inter-dependent multiprocessor, co-processor, and custom instruction design problems result in a huge design space. Therefore, efficient tools are needed that assist designers to create high-quality architectures in limited time. In this chapter, we describe a framework that generates extensible processor based MPSoC architectures for a given application, by synergistically exploring custom instruction, co-processor, and multiprocessor optimizations. The framework automatically maps embedded applications to MPSoC architectures, aiming to minimize application execution time and energy consumption, while the overall area for the MPSoC is kept within a given budget.
Fei Sun, Srivaths Ravi, Anand Raghunathan, Niraj K. Jha
Chapter 5. Design and Run Time Code Compression for Embedded Systems
Compression has long been utilized in electronic systems to improve performance, reduce transmission costs, and to minimize code size. In this chapter we show two separate techniques to compress instructions. The first technique compresses instruction traces, so that the compressed trace can be used to explore the best cache configuration to be used in an embedded system. Trace compression enables rapid cache space exploration. The second technique uses compressed instruction in memory, to be expanded just before execution in the processor. This enables a smaller code footprint, and reduced power consumption. This chapter explains the methods, and shows the benefits of two orthogonal approaches to the design of an embedded system.
Sri Parameswaran, Jörg Henkel, Andhi Janapsatya, Talal Bonny, Aleksandar Ignjatovic

Embedded Memories

Frontmatter
Chapter 6. Power Optimisation Strategies Targeting the Memory Subsystem
Power optimisations targeting the memory subsystem have received considerable attention in recent years because of the dominant role played by memory in the overall system power. The more complex the application, the greater the volume of instructions and data involved, and hence, the greater the significance of issues involving power-efficient storage and retrieval of these instructions and data. In this chapter we give a brief overview of how memory architecture and accesses affect system power dissipation, and some recent proposals on reducing memory-related power through diverse mechanisms: optimisations of the traditional cache memory system, architectural innovations targeting application specific designs, compiler optimisations, and other techniques.
Preeti Ranjan Panda
Chapter 7. Layer Assignment Techniques for Low Energy in Multi-Layered Memory Organizations
Nearly all platforms use a multi-layer memory hierarchy to bridge the enormous latency gap between the large off-chip memories and local register files. However, most of previous work on HW or SW controlled techniques for layer assignment have been mainly focused on performance. As a result, the intermediate layers have been assigned too large sizes leading to energy inefficiency. In this chapter we present a technique that takes advantage of both the temporal locality and limited lifetime of the arrays of the application for trading performance and energy consumption under layer size constraints. These tradeoff points are the so-called Pareto points, which represent solutions which are not only the optimal points in energy or time, but also intermediate points in a way that it is not possible to gain energy without loosing time or vice versa. A prototype tool has been developed and tested using two real life applications of industrial relevance. Following this approach we have been able to half the energy consumed by the data memory hierarchy for each of our drivers.
Erik Brockmeyer, Bart Durinck, Henk Corporaal, Francky Catthoor
Chapter 8. Memory Bank Locality and Its Usage in Reducing Energy Consumption
Bank locality can be defined, in the context of a multi-bank memory system, as localizing the number of load/store accesses to a small set of memory banks at a given time. An optimizing compiler can modify a given input code to improve its bank locality. There are several practical advantages of enhancing bank locality, the most important of which is reduced memory energy consumption. Recent trends indicate that energy consumption is fast becoming a first-order design parameter as processor-based systems continue to become more complex and multi-functional. Off-chip memory energy consumption in particular can be a limiting factor in many embedded system designs. This paper presents a novel compiler-based strategy for maximizing the benefits of low-power operating modes available in some recent DRAM-based multi-bank memory systems. In this strategy, the compiler uses linear algebra to represent and optimize bank locality in a mathematical framework. We discuss that exploiting bank locality can be cast as loop (iteration space) and array layout (data space) transformations. We also present experimental data showing the effectiveness of our optimization strategy. Our results show that exploiting bank locality can result in large energy savings.
Mahmut Kandemir

Dynamic Voltage and Frequency Scaling

Frontmatter
Chapter 9. Fundamentals of Power-Aware Scheduling
Power-aware scheduling plays a key role in curtailing the energy consumption in real-time embedded systems. Since there is a vast variance in the composition and functionality of real-time embedded systems, different power-aware scheduling techniques are naturally needed. However, certain fundamental principles are applicable to all such systems. This chapter provides an overview of the basics in power and performance tradeoff and in real-time system scheduling. It also discusses the benefit of power-aware scheduling via a simple example. A categorization of different power-aware scheduling techniques are presented at the end.
Xiaobo Sharon Hu, Gang Quan
Chapter 10. Static DVFS Scheduling
DVFS processors, if used properly, can dramatically reduce the energy consumption of real-time systems employing such processors. In this chapter, two static or off-line, voltage/frequency selection techniques are presented to maximally exploit the energy-saving benefit provided by DVFS processors. The first technique targets a popular dynamic-priority task scheduling algorithm, i.e., the Earliest Deadline First algorithm, while the second is applicable to any fixedpriority task scheduling algorithm. Other related work is reviewed at the end of the chapter.
Gang Quan, Xiaobo Sharon Hu
Chapter 11. Dynamic DVFS Scheduling
As discussed in the previous chapter, offline analysis can be used to generate a schedule of DVFS state changes to minimize energy consumption, while ensuring sufficient processing cycles are available for all tasks to meet their deadlines, even under worst-case computation requirements. However, invocations of real-time tasks typically use less than their specified worst-case computation requirements, presenting an opportunity for further energy conservation. This chapter outlines three online, dynamic techniques to more aggressively scale back processing frequency and voltage to conserve energy when task computation cycles vary, yet continue to provide timeliness guarantees for worst-case execution time scenarios.
Padmanabhan S. Pillai, Kang G. Shin
Chapter 12. Voltage Selection for Time-Constrained Multiprocessor Systems
Dynamic voltage selection and adaptive body biasing have been shown to reduce dynamic and leakage power consumption effectively. In this chapter we present an energy optimization approach for time constrained applications implemented on multiprocessor systems. We start by introducing a genetic algorithm that performs the mapping and scheduling of the application on the target hardware architecture. Then, we discuss in detail several voltage selection algorithms, explicitly taking into account the transition overheads implied by changing voltage levels.
Alexandru Andrei, Petru Eles, Zebo Peng, Marcus Schmitz, Bashir M. Al-Hashimi

Compiler Techniques

Frontmatter
Chapter 13. Compilation Techniques for Power, Energy, and Thermal Management
In addition to hardware and operating system directed techniques, compilerdirected power, energy, and thermal management has gained increasing importance. This chapter discusses the potential benefits of compiler-based approaches to solve the power/energy/thermal management problem. The ability of the compiler to reshape program behavior through aggressive, whole program optimizations, and to predict future program behaviors can give it an advantage over hardware and operating systems techniques. This chapter introduces several optimization metrics, together with state-of-the-art optimizations that target these metrics.
Ulrich Kremer
Chapter 14. Compiler-Directed Dynamic CPU Frequency and Voltage Scaling
This paper presents the design, implementation, and evaluation of a compiler algorithm that effectively optimizes programs for energy usage using dynamic voltage and frequency scaling (DVFS). The algorithm identifies program regions where the CPU can be slowed down with negligible performance loss, and has been implemented as a source-to-source level compiler transformation using the SUIF2 compiler infrastructure. Physical measurements on a notebook computer show that total system energy savings of up to 28% can be achieved with performance degradation of less than 5% for the SPEC CPU95 benchmarks. On average, the system energy and energy-delay product are reduced by 11% and 9%, respectively, with a performance slowdown of 2%.
Chung-Hsing Hsu, Ulrich Kremer
Chapter 15. Link Idle Period Exploitation for Network Power Management
Network power optimization is becoming increasingly important as the sizes of the data manipulated by parallel applications and the complexity of interprocessor data communications are continuously increasing. Several hardware-based schemes have been proposed in the past for reducing network power consumption, either by turning off unused communication links or by lowering voltage/frequency in links with low usage. While the prior research shows that these schemes can be effective in certain cases, they share the common drawback of not being able to predict the link active and idle times very accurately. This paper, instead, proposes a compiler-based scheme that determines the last use of communication links at each loop nest and inserts explicit link turn-off calls in the application source. Specifically, for each loop nest, the compiler inserts a turn-off call per communication link. Each turnedoff link is reactivated upon the next access to it. We automated this approach within a parallelizing compiler and applied it to eight array-intensive embedded applications. Our experimental analysis reveals that the proposed approach is very promising from both performance and power perspectives. In particular, it saves more energy than a pure hardware-based scheme while incurring much less performance penalty than the latter.
Feihui Li, Guangyu Chen, Mahmut Kandemir, Mustafa Karakoy
Chapter 16. Remote Task Mapping
Widespread deployment of wireless LANs offer opportunities for users of handheld devices to access not only public information over the internet but also resources on their own desktop computers or trusted servers. Such resources include data, storage, and CPU, among others. The discussion in this chapter focuses on the issue of remote task mapping for the purpose of offloading computational tasks from the resource-constrained handheld devices to the resourcerich desktop computers and servers. The main objective is to reduce both the time and energy required to accomplish the tasks. Compiler techniques used to analyze an ordinary program before transforming it into an efficient client-server distributed program are presented, along with a set of experimental results.
Zhiyuan Li, Cheng Wang

Multi-Processors

Frontmatter
Chapter 17. A Power and Energy Perspective on MultiProcessors
In the past few years, we have seen the rise of multiprocessor and multicore approaches to system-on-chip and processor design, driven by performance, power dissipation, and energy consumption motivations. In fact, there is considerable confusion over the various architectural choices for multiprocessor systems, and the primary design methods that are appropriate to each choice. The techniques for processor design and power and energy reduction that have been discussed in previous chapters of this book are often orthogonal to the basic architectural choices in multiprocessor systems, so that these techniques can be combined together in designing multiprocessor or multicore SoCs. In this chapter, we will review the main approaches to multiprocessor architecture and their impact on power and energy.
Grant Martin
Chapter 18. System-Level Design of Network-on-Chip Architectures
Multi-processor System-on-Chip (MPSoC) architectures in future will be implemented in less than 50 nm technology and include tens to hundreds of processing element blocks operating in the multi-GHz range. The on-chip interconnection network will be a key factor in determining the performance and power consumption of these multi-core devices. Packet switched interconnection networks or Network-on-Chip (NoC) has emerged as an attractive alternative to traditional bus-based architectures for satisfying the communication requirements of these MPSoC architectures. The key challenge in NoC design is to produce a complex, high performance and low energy architecture under tight time to market requirements. The NoC architectures would support the communication demands of hundreds of cores under stringent performance constraints. In addition to the complexity, the NoC designers would also have to contend with the physical challenges of design in nanoscale technologies. The NoC design problem would entail a joint optimization of the system-level floorplan and power consumption of the network. All these factors coupled with the requirement for short turn around times raises the need for an intellectual property (IP) re-use methodology that is well supported with design and optimization techniques, and performance evaluation models. This chapter introduces the concept of NoC and presents the various elements of the IP-based system-level methodology required for its design.
Karam S. Chatha, Krishnan Srinivasan
Chapter 19. Power-Performance Modeling and Design for Heterogeneous Multiprocessors
As single-chip systems are increasingly composed of heterogeneous multiprocessors an opportunity exists to explore new levels of low-power design. At the chip/system-level any processor is capable of executing any program (or task) with only differences in performance. When the system executes a variety of different task sets (loading), the problem becomes one of establishing the cost and benefit of matching task types to processor types under anticipated task loads on the system. This includes not only static task mapping, but dynamic scheduling decisions as well as the selection of the most appropriate set of processors for the system. In this chapter, we consider what models are appropriate to establish system-level power-performance trade-offs and propose some early design strategies in this new level of design.
JoAnn M. Paul, Brett H. Meyer

Reconfigurable Computing

Frontmatter
Chapter 20. Basics of Reconfigurable Computing
This chapter introduces the basic concepts of Reconfigurable Computing and its disruptive impact on the classical instruction-streambased mind set of computing sciences. It illustrates the essentials of the paradigm shift by Reconfigurable Computing and explains the mechanisms behind the massive speed-ups obtained by software to configware migration.
Reiner Hartenstein, Tu Kaiserslautern
Chapter 21. Dynamic Reconfiguration
The adaptivity of electronic systems to their environment and inner system status enables the processing of different applications in time slots on one reconfigurable hardware architecture. Adaptivity “On-Demand” related to non predictable requirements from the user or the environment means the optimization of power dissipation and performance at run-time by providing on-chip computation capacity. Traditional microprocessor based electronic systems are able to adapt the software to the required tasks of an application. The disadvantage in those system is the sequential processing of the software code and the fixed hardware architecture which doesn’t allow to adapt internal structures in order to optimize the data throughput. Reconfigurable hardware, in this section Field Programmable Gate Arrays (FPGAs), allows adapting the hardware system architecture at design- and run-time as well as the integrated software within the included soft-core IP-Processing cores.
Jürgen Becker, Michael Hübner
Chapter 22. Applications, Design Tools and Low Power Issues in FPGA Reconfiguration
Dynamic reconfiguration allows the circuit configured on an FPGA to be optimized to the required function and performance constraints. Traditionally, designers used dynamic reconfiguration to increase the speed or decrease the area of their system. This chapter considers a variety of use models for lowering the power requirements of dynamically reconfigurable processor systems. We begin with a review of FPGA reconfiguration and the applications that have been applied to it so far. We then describe the tool flow for reconfiguration and expand the discussion, first, to present experimental results from the literature that characterize the power-profile of reconfiguration itself and, second, to review system-level use models of dynamic reconfiguration that may improve the overall system power requirements.
Adam Donlin
Backmatter
Metadaten
Titel
Designing Embedded Processors
herausgegeben von
Jörg Henkel
Sri Parameswaran
Copyright-Jahr
2007
Verlag
Springer Netherlands
Electronic ISBN
978-1-4020-5869-1
Print ISBN
978-1-4020-5868-4
DOI
https://doi.org/10.1007/978-1-4020-5869-1

Neuer Inhalt