Skip to main content

2006 | Buch

Embedded Computer Systems: Architectures, Modeling, and Simulation

6th International Workshop, SAMOS 2006, Samos, Greece, July 17-20, 2006. Proceedings

herausgegeben von: Stamatis Vassiliadis, Stephan Wong, Timo D. Hämäläinen

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Keynotes

Reconfigurable Platform for Digital Convergence Terminals

It is apparent that future IT terminals including handsets will be multi-mode convergence devices. Therefore it becomes more and more important to be able to devise a low-power platform which is flexible enough to implement multiple different basebands on top of it. Moreover, real time reconfigurability is crucial considering the fact that technologies keep evolving and over the air software/firmware upgrade is being required. In this paper, a new type of reconfigurable platform will be discussed and we see how it help end user device manufacturer deliver better multi-mode terminals with better maintenance scheme.

Jinsung Choi
European Research in Embedded Systems

Digital information technology has revolutionized the world within less than four decades. It has taken the step from mainframe computers, mainly operated as hosts in computing centres, to desktops and laptops, connected by networks and found nearly on all office desks and tables today. Computers have become every day tools deeply integrated into all kinds of activities of our life.

Panagiotis Tsarchopoulos

System Design and Modeling

Interface Overheads in Embedded Multimedia Software

The multimedia capabilities in battery powered mobile communication devices should be provided at high energy efficiency. Consequently, the hardware is usually implemented using low-power technology and the hardware architectures are optimized for embedded computing. Software architectures, on the other hand, are not embedded system specific, but closely resemble each other for any computing device. The popular architectural principle, software layering, is responsible for much of the overheads, and explains the stagnation of active usage times of mobile devices. In this paper, we consider the observed developments against the needs of multimedia applications in mobile communication devices and quantify the overheads in reference implementations.

Tero Rintaluoma, Olli Silven, Juuso Raekallio
A UML Profile for Asynchronous Hardware Design

In this work we present UML for Hardware Design (UML-HD), a UML profile suitable for Asynchronous Hardware Design and an approach for automatically generating a Hardware Description Language (HDL) model from UML-HD models. A UML-HD model comprises solely class diagrams and an action language. We use stereotypes in two categories – structure and activity – to categorise classes. Structure type stereotypes signify state and activity type signify transitions. The approach is largely inspired by Petri nets. Several model transformations are suggested in this paper, but only code generation to Haste was implemented.

Kim Sandström, Ian Oliver
Automated Distribution of UML 2.0 Designed Applications to a Configurable Multiprocessor Platform

This paper presents automated distribution of embedded real-time applications modeled in Unified Modeling Language version 2.0 (UML 2.0). The automated distribution requires methods and tools for design automation, as well as the run-time environment for the distributed execution on the target platform. Executable application code is generated from UML models, and UML with a custom profile is used to abstract hardware architecture and configure application mapping. For experimenting, a full featured WLAN terminal was designed in UML and implemented as a distributed multiprocessor system-on-chip (SoC) on an FPGA prototype platform. Measurements show that a 50-70% reduction in protocol delays is achived with distribution, and delay variations are reduced 45-85%.

Mikko Setälä, Petri Kukkala, Tero Arpinen, Marko Hännikäinen, Timo D. Hämäläinen
Towards a Transformation Chain Modeling Language

The Model Driven Development (MDD) paradigm stimulates the use of models as the main artifacts for software development. These models can be situated at high levels of abstraction, close to the application’s business domain. Many consecutive automatic transformations (a transformation chain) can be applied to these models to add the necessary details in order to generate a concrete implementation. This means that a large part of the total development effort is relocated to the development of transformations and hence we should have the necessary tooling support for designing transformation chains. In this paper we propose a metamodel for a transformation chain modeling language that enables implementation independent composition of transformations. We also propose a concrete syntax for this language that is based on UML activity diagrams.

Bert Vanhooff, Stefan Van Baelen, Aram Hovsepyan, Wouter Joosen, Yolande Berbers
Key Research Challenges for Successfully Applying MDD Within Real-Time Embedded Software Development

Model-Driven Development (MDD) is a software development paradigm that promotes the use of models at different levels of abstraction and perform transformations between them to derive one or more concrete application implementations. In this paper we analyze the current status of MDD regarding its applicability for the development of Real-Time Embedded Software. We discuss different modeling framework approaches used to specify the various models, and compare OMG/MDA-based approaches (MOF, UML Profiles and executable UML) with a generic MDD-based approach (GME). Finally, we identify the key challenges for future MDD research in order to successfully apply MDD within RTES Development. These challenges are mainly situated in the field of modeling and standardization of abstraction levels, model transformations and code generation, traceability, and integration of existing software within the MDD development process

Aram Hovsepyan, Stefan Van Baelen, Bert Vanhooff, Wouter Joosen, Yolande Berbers
Domain-Specific Modeling of Power Aware Distributed Real-Time Embedded Systems

This paper provides two contributions to the research on applying domain-specific modeling languages to distributed real-time embedded (DRE) systems. First, we present the

Alderis

platform-independent visual language for component-based system development. Second, we demonstrate the use of the

Alderis

language on a helicopter autopilot DRE design. The

Alderis

language is based on the concept of platform-based design, and explicitly captures asynchronous event-driven component interactions as well as the underlying platform for the computation. Unlike most modeling languages,

Alderis

has formally defined semantics providing a way for the formal verification of dense real-time properties and energy consumption.

Gabor Madl, Nikil Dutt
Mining Dynamic Document Spaces with Massively Parallel Embedded Processors

Currently Océ investigates future document management services. One of these services is accessing dynamic document spaces, i.e. improving the access to document spaces which are frequently updated (like newsgroups). This process is rather computational intensive.

This paper describes the research conducted on software development for massively parallel processors. A prototype has been built which processes streams of information from specified newsgroups and transforms them into personal information maps.

Although this technology does speed up the training part compared to a general purpose processor implementation its real benefits emerges with larger problem dimensions because of the scalable approach.

Jan W. M. Jacobs, Rui Dai, Gerard J. M. Smit
Efficient Automated Clock Gating Using CoDeL

We present a highly efficient automated clock gating platform for rapidly developing power efficient hardware architectures. Our language, called CoDeL, allows hardware description at the algorithm level, and thus dramatically reduces design time. We have extended CoDeL to automatically insert clock gating at the behavioral level to reduce dynamic power dissipation in the resulting architecture. This is, to our knowledge, the first hardware design environment that allows an algorithmic description of a component and yet produces a power aware design. To estimate the power savings, we have developed an estimation framework, which is shown to be consistent with the power savings obtained using statistical power analysis using Synopsys tools. To evaluate our platform we use the CoDeL implementation of a counter and various integer transforms used in the realm of DSP (Digital Signal Processing): discrete wavelet transform, discrete cosine transform and an integer transform used in the H.264 (MPEG4 Part 10) video compression standard. These designs are then clock gated using CoDeL and Synopsys. A simulation based power analysis on the designed circuits shows that CoDeL’s clock gating performs better than Synopsys’ automated clock gating. CoDeL reduces the power dissipation by 83% on average, while Synopsys gives 81% savings.

Nainesh Agarwal, Nikitas J. Dimopoulos
An Optimization Methodology for Memory Allocation and Task Scheduling in SoCs Via Linear Programming

Applications for system on chips become more and more complex. Also the number of available components (DSPs, ASICs, Memories, etc.) rises continuously. These facts necessitate a structured method for selecting components, mapping applications and evaluating the chosen configuration and mapping. In this work we present a methodology for the last named. We will consider optimization of memory allocation and task scheduling as a packing problem and minimize needed memory area. The results can be used as one element of an automated performance analysis for a given system on a high abstraction level. This analysis is essential for establishing a framework that iterates over a large quantity of possible systems. Considering a part of the H.264 codec as an example we will illustrate the results. Furthermore we will show that results can be retrieved fast compared to other NP-hard problems due to intelligent formulation of conditions within the linear program.

Bastian Ristau, Gerhard Fettweis

Wireless Sensor Networks

Designing Wireless Sensor Nodes

Wireless sensor networks are networks of large quantities of compact microsensors with wireless communication capability. Emerging applications of data gathering range from the environmental to the military. Architectural challenges are posed for designers such as computational power, energy consumption, energy sources, communication channels and sensing capabilities. This work presents the current state-of-the-art for wireless sensor nodes, investigating and analyzing these challenges. We discuss the characteristics and requirements for a sensor node. A comprehensive comparative study of sensor node platforms, energy management techniques, off-the-shelf microcontrollers, battery types and radio devices is presented.

Marcos A. M. Vieira, Adriano B. da Cunha, Diógenes C. da Silva Jr.
Design, Implementation, and Experiments on Outdoor Deployment of Wireless Sensor Network for Environmental Monitoring

This paper presents the design, implementation, and practical real world experiments of an energy optimized multi-hop wireless sensor network (WSN) targeted at environmental monitoring. The WSN is fully autonomous and consists of energy-efficient and scalable communication protocols and low-power hardware platform. Software tools are developed for configuring and analyzing large scale networks. The network has been deployed in outdoor environment consisting of 20 nodes covering over 2 km

2

area. The results show that the multi-hop network works autonomously, reacts to environmental changes, and is able to operate temperatures down to -30 °C. The hardware nodes operating on 433 MHz frequency provide over 1 km communication distances, while still having sufficient throughput and low energy consumption. The deployed nodes had a lifetime of 6 months with a 1600 mAh battery, while generating 4 packets per minute.

Jukka Suhonen, Mikko Kohvakka, Marko Hännikäinen, Timo D. Hämäläinen
LATONA: An Advanced Server Architecture for Ubiquitous Sensor Network

The emerging Ubiquitous Sensor Network (USN) makes connection less datagrams and short event packets get popular. A large number of short term event packets of USN can cause serious problems, such as interrupt handling overhead and context switching overhead. Furthermore, heavy load of the packet security methods needs enough processing power. Then, the more USN develops, the more network overheads would be loaded into host CPU. To solve the problems, we propose a special server component including TOE (TCP/IP Offloading Engine) and H/W IPSec (IP Layer Security) for USN.

Chi-Hoon Shin, Soo-Cheol Oh, Dae-Won Kim, Sun-Wook Kim, Kyoung Park, Sung-Woon Kim
An Approach for the Reduction of Power Consumption in Sensor Nodes of Wireless Sensor Networks: Case Analysis of Mica2

This paper presents a novel solution for the effective reduction of power consumption in sensor nodes of wireless sensor networks. Possible alternatives to reduce the power consumption in generic sensor nodes are presented. After, these alternatives are evaluated for a specific sensor node, the Crossbow Mica2. The case analysis for this sensor node showed that, among the possible alternatives to reduce the power consumption, the radio communication channel presented the best opportunity. A novel solution that integrates the transmitted signal power control with the received information quality is presented in a dynamic mechanism called Maximal Survival Capacity.

Adriano B. da Cunha, Diógenes C. da Silva Jr.
Energy-Driven Partitioning of Signal Processing Algorithms in Sensor Networks

In a sensor network

, as we increase the number of nodes, the requirements on network lifetime, and the volume of data traffic across the network, it is often efficient to move towards hierarchical network architectures (e.g., see [5]). In such hierarchical networks, sensor nodes are clustered into groups, and their roles are divided into master and slave nodes for more efficient structuring of network traffic. The opera tional complexity of each sensor node and the amount of data to be transmitted across sensor nodes strongly influence the energy consump tion of the nodes, which ultimately determines the network lifetime. This paper provides a new way of reducing data traffic across nodes by determining and exploiting the lowest data token delivery points within an application graph that is distributed across a network. The technique divides an application graph into two sub-graphs and then distributes each divided subgraph over a master node and its associated slave nodes. The buffer costs of the graph edges over the cutting line corre sponds to the amount of data to be transmitted between nodes after allo cating the two partial subgraphs such that one subgraph executes on a master node, and the other subgraph is distributed across the associated slave nodes. Since the energy consumption on each node is dominated by the transceiver, the reduced data traffic allows for reducing the turn-on time of the transceivers, and thereby leads to high energy savings. This technique also distributes the workload of sensor nodes in a sys tematic manner. The more balanced workload also contributes to effi cient battery usage, and also improves the latency for processing the data frames captured by the sensor nodes.

Dong-Ik Ko, Chung-Ching Shen, Shuvra S. Bhattacharyya, Neil Goldsman
Preamble Sense Multiple Access (PSMA) for Impulse Radio Ultra Wideband Sensor Networks

In this paper we propose preamble sense multiple access (PSMA), a random access MAC protocol capable of clear channel assessment in impulse radio-ultra wideband environment. Full compatibility with IEEE 802.15.4a contention access period is the key design criteria of PSMA, and the goal is to provide an alternative approach to the 802.15.4a envisioned slotted ALOHA and periodic preamble segment transmission schemes. The evaluation of PSMA consists of a traditional throughput analysis as well as energy consumption and delay analysis that takes into account the special features of impulse radio ultra wideband approach. From the analysis we can claim that PSMA has a very good energy and delay performance in addition to satisfactory throughput when the offered traffic to the channel is from low to moderate.

Jussi Haapola, Leonardo Goratti, Isameldin Suliman, Alberto Rabbachin
Security in Wireless Sensor Networks: Considerations and Experiments

Wireless Sensor Networks (WSN) are seen as attractive solutions for various monitoring and controlling applications, a large part of which require protection. Due to the special characteristics of WSNs, e.g. low processing and energy resources and ad hoc networking, developing a reliable security solution becomes a challenging task. In this paper we survey various security aspects of WSNs, consisting of threats, attacks, and proposed solutions. We also present experiments with our own WSN technology (TUTWSN), concentrating on a centralized key distribution and authentication service. Our experiments suggest that a centralized scheme can be a feasible solution in certain WSN configurations.

Panu Hämäläinen, Mauri Kuorilehto, Timo Alho, Marko Hännikäinen, Timo D. Hämäläinen
On Security of PAN Wireless Systems

This paper describes security features of ZigBee and Bluetooth PAN wireless networks. On examples of those two wireless systems are demonstrated challenges associated with utilization of present wireless systems for applications requiring secure data exchange. Recent penetration of wireless technologies into building and process automation applications even increases the need to fully understand the limitations of the security concepts used.

Ondrej Hyncica, Peter Kacz, Petr Fiedler, Zdenek Bradac, Pavel Kucera, Radimir Vrba

Processor Design

Code Size Reduction by Compiler Tuning

Code size is a main cost factor for many high volume electronic devices. It is therefore important to reduce the size of the applications in an embedded system. Several methods have been proposed to deal with this problem, mostly based on compressing the binaries. In this paper, we approach the problem from a different perspective. We try to exploit the back end code optimizations present in a production compiler to generate as few assembly instructions as possible. This approach is based on iterative compilation in which many different versions of the code are tested. We employ statistical analysis to identify the compiler options that have the largest effect on code size. We have applied this technique to

gcc

3.3.4 using the MediaBench suite and four target architectures. We show that in almost all cases we produce shorter codes than the standard setting

-Os

does which is designed to optimize for size. In some cases, we generate code that is 30% shorter than

-Os

.

Masayo Haneda, Peter M. W. Knijnenburg, Harry A. G. Wijshoff
Energy Optimization of a Multi-bank Main Memory

A growing part of the energy, battery-driven embedded system, is consumed by the off-chip main memory. In order to minimize this memory consumption, an architectural solution is recently adopted. It consists of multi-banking the addressing space instead of monolithic memory. The main advantage in this approach is the capability of setting banks in low power modes when they are not accessed, such that only the accessed bank is maintained in active mode. In this paper we investigate how this power management capability built into modern DRAM devices can be handled for multi-task applications. We aim to find, at system level design, both an efficient allocation of applications tasks to memory banks, and the memory configuration that lessen the energy consumption: number of banks and the size of each bank. Results show the effectiveness of this approach and the large energy savings.

Hanene Ben Fradj, Sébastien Icart, Cécile Belleudy, Michel Auguin
Probabilistic Modelling and Evaluation of Soft Real-Time Embedded Systems

Soft real-time systems are often analysed using hard real-time techniques, which are not suitable to take into account the deadline misses rate allowed in such systems. Therefore, the resulting system is over-dimensioned, thus expensive. To appropriately dimension soft real-time systems, adequate models, capturing their varying runtime behaviour, are needed. By using the concepts of a mathematically defined language, we provide a modelling approach based on patterns that are able to express the variations appearing in the system timing behaviour. Based on these modelling patterns, models can be easily created and are amenable to average case performance evaluation. By the means of a case study, we show the type of results that can be obtained from such an evaluation and how these results are used to dimension the system.

Oana Florescu, Menno de Hoon, Jeroen Voeten, Henk Corporaal
Hybrid Functional and Instruction Level Power Modeling for Embedded Processors

In this contribution the concept of Functional-Level Power Analysis (FLPA) for power estimation of programmable processors is extended in order to model even embedded general purpose processors. The basic FLPA approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory etc. The power consumption of these blocks is described by parameterized arithmetic models. By application of a parser based automated analysis of assembler codes the input parameters of the arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. For modeling an embedded general purpose processor (here, an ARM940T) the basic FLPA modeling concept had to be extended to a so-called hybrid functional level and instruction level model in order to achieve a good modeling accuracy. The approach is exemplarily demonstrated and evaluated applying a variety of basic digital signal processing tasks ranging from basic filters to complete audio decoders. Estimated power figures for the inspected tasks are compared to physically measured values. A resulting maximum estimation error of less than 8 % is achieved.

Holger Blume, Daniel Becker, Martin Botteck, Jörg Brakensiek, Tobias G. Noll
Low-Power, High-Performance TTA Processor for 1024-Point Fast Fourier Transform

Transport Triggered Architecture (TTA) offers a cost-effective trade-off between the size and performance of ASICs and the programmability of general-purpose processors. This paper presents a study where a high performance, low power TTA processor was customized for a 1024-point complex-valued fast Fourier transform (FFT). The proposed processor consumes only 1.55

μ

J of energy for a 1024-point FFT. Compared to other reported FFT implementations with reasonable performance, the proposed design shows a significant improvement in energy-efficiency.

Teemu Pitkänen, Risto Mäkinen, Jari Heikkinen, Tero Partanen, Jarmo Takala
Software Pipelining Support for Transport Triggered Architecture Processors

Many telecommunication applications, especially baseband processing, and digital signal processing (DSP) applications call for high-performance implementations due to the complexity of algorithms and high throughput requirements. In general, the required performance is obtained with the aid of parallel computational resources. In these application domains, software implementations are often preferred over fixed-function ASICs due to the flexibility and ease of development. Application-specific instruction-set processor (ASIP) architectures can be used to exploit efficiently the inherent parallelism of the algorithms but still maintaining the flexibility. Use of high-level languages to program processor architectures with parallel resources can lead to inefficient resource utilization and, on the other hand, parallel assembly programming is error prone and tedious.

In this paper, the inherent problems of parallel programming and software pipelining are mitigated with parallel language syntax and automatic generation of software pipelined code for the iteration kernels. With the aid of the developed tool support, the underlying performance of a processor architecture with parallel resources can be exploited and full utilization of the main processing resources is obtained for pipelined loop kernels. The given examples show that efficiency can be obtained without reducing the performance.

Perttu Salmela, Pekka Jääskeläinen, Tuomas Järvinen, Jarmo Takala
SAD Prefetching for MPEG4 Using Flux Caches

In this paper, we consider flux caches prefetching and a media application. We analyze the MPEG4 encoder workload with realistic data set in a scenario representative for the embedded systems domain. Our study shows that different well known data prefetch mechanisms can gain little reduction in the cache miss ratios when applied on the complete MPEG4 application. Furthermore, we investigate the potential improvement when dedicated prefetching strategies are applied to the sum of absolute differences (SAD) kernels in MPEG4. We propose a flux cache mechanism that dynamically invokes cache designs with dedicated prefetching engines that can fully utilize the available memory bandwidth. We show that our proposal improves the cache miss ratios by a factor close to 3x.

Georgi N. Gaydadjiev, Stamatis Vassiliadis
Effects of Program Compression

The size of the program code has become a critical design constraint in embedded systems, especially in handheld devices. Large program codes require large memories, which increase the size and cost of the chip. In addition, the power consumption is increased due to higher memory I/O bandwidth. Program compression is one of the most often used methods to reduce the size of the program code. In this paper, two compression approaches, dictionary-based compression and instruction template-based compression, were evaluated on a customizable processor architecture with parallel resources. The effects on area and power consumption were measured. Dictionary-based compression reduced the area at best by 77% and power consumption by 73%. Instruction template-based compression resulted in increase in both area and power consumption and hence turned out to be impractical.

Jari Heikkinen, Jarmo Takala
Integrated Instruction Scheduling and Fine-Grain Register Allocation for Embedded Processors

This paper proposes a new integration technique, called IRIS (Integrated Register allocation and Instruction Scheduling), to combine instruction scheduling and register allocation. Both register allocation and instruction scheduling are performed simultaneously at each variable reference where the selection between serialization by scheduling and spilling by register allocation is determined. To make a right selection, the costs of serialization and spilling are estimated with a cost model proposed to reduce the complexity of the estimation. Experiments show that IRIS achieves significant improvements when compared to widely-used existing techniques.

Dae-Hwan Kim, Hyuk-Jae Lee
Compilation and Simulation Tool Chain for Memory Aware Energy Optimizations

Memories are known to be the energy bottleneck of portable embedded devices. Numerous memory aware energy optimizations have been proposed. However, both the optimization and the validation are performed in an ad-hoc manner as a coherent optimizing compilation and simulation framework does not exist as yet. In this paper, we present such a framework for performing memory hierarchy aware energy optimization. Both the compiler and the simulator are configured from a single memory hierarchy description. Significant savings of up to 50% in the total energy dissipation are reported.

Manish Verma, Lars Wehmeyer, Robert Pyka, Peter Marwedel, Luca Benini
A Scalable, Multi-thread, Multi-issue Array Processor Architecture for DSP Applications Based on Extended Tomasulo Scheme

A scalable, distributed micro-architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with out-of-order execution, that supports specialized, complex DSP function units, and simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks, forming a ”network-on-a-chip” (NOC) [1]. The communication protocol is a modified version of Tomasulo’s scheme [2], that was extended to eliminate all central control structures for the data flow and to support multithreading. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.

Mladen Bereković, Tim Niggemeier
Reducing Execution Unit Leakage Power in Embedded Processors

We introduce low-overhead power optimization techniques to reduce leakage power in embedded processors. Our techniques improve previous work by a) taking into account idle time distribution for different execution units, and b) using instruction decode and control dependencies to wakeup the gated (but needed) units as soon as possible. We take into account idle time distribution per execution unit to detect an idle time period as soon as possible. This in turn results in increasing our leakage power savings. In addition, we use information already available in the processor to predict when a gated execution unit will be needed again. This results in early and less costly reactivation of gated execution units. We evaluate our techniques for a representative subset of MiBench benchmarks and for a processor using a configuration similar to Intel’s Xscale processor. We show that our techniques reduce leakage power considerably while maintaining performance.

Houman Homayoun, Amirali Baniasadi
Memory Architecture Evaluation for Video Encoding on Enhanced Embedded Processors

In this paper we investigate the impact of different memory configurations on performance and energy consumption of the video encoding applications, MPEG-4 and H.264. The memory architecture is integrated with SIMD extended embedded processor, proposed in our previous work. We explore both dedicated memories and multilevel cache architectures and perform exhaustive simulations. The simulations have been conducted using highly optimized proprietary video encoding code for mobile handheld devices. Our simulation results show that the performance improvement of dedicated memories on video encoding applications is not very significant. The multilevel cache-based architecture processes approximately 17 frames/s compared to 19-22 frames/s for 512 KB dedicated on-chip zero-wait state memory. Thus it is difficult to justify using dedicated memory for this kind of embedded systems, when energy consumption and cost of implementation are also considered.

Ali Iranpour, Krzysztof Kuchcinski
Advantages of Java Processors in Cache Performance and Power for Embedded Applications

Java, with its advantages as being an overspread multiplatform object oriented language, has been gaining popularity in the embedded system market over the years. Furthermore, because of its extra layer of interpretation, it is also believed that it is a slow language while being executed. However, when this execution is done directly in hardware, advantages because of its stack nature start to appear. One of these advantages concerns the memory utilization, impacting in less accesses and cache misses. In this work we analyze this impact in performance and energy consumption, comparing a Java processor with a RISC one based on a MIPS with similar characteristics.

Antonio Carlos S. Beck, Mateus B. Rutzig, Luigi Carro

Dependable Computing

CARROT – A Tool for Fast and Accurate Soft Error Rate Estimation

We present a soft error rate (SER) analysis methodology within a simulation and design environment that covers a broad spectrum of design problems and parameters. Our approach includes modeling of the particle hit at the transistor level, fast Monte-Carlo type simulation to obtain the latching probability of a particle hit on all nodes of the circuit, embedded timing analysis to obtain the latching window, and fine-grained accounting of the electrical masking effects to account for both the effects of scaling and of pulse duration versus the period of the system clock to get an estimate of the maximum SER of the circuit. This approach has been implemented in CARROT and placed under a broad design environment to assess design tradeoffs with SER as a parameter.

Dimitrios Bountas, Georgios I. Stamoulis
A Scheduling Strategy for a Real-Time Dependable Organic Middleware

This paper presents the architecture and conception of a dependable organic middleware based on the yet existing,

not organic

middleware OSA+. We show a scheduling strategy which assigns missions in real-time to a distributed set of platforms in the scope of a fabric automation scenario. The missions are distributed to different robots by the organic middleware whose scheduling includes organic aspects like self-organization, self-optimization and self-healing.

Uwe Brinkschulte, Alexander von Renteln, Mathias Pacher
Autonomous Construction Technology of Community for Achieving High Assurance Service

In the retail business under the evolving market, the users solicit continuously to utilize the appropriate services based on their preferences and situations. Such requirements can not be satisfied with the conventional centralized system, due to the dynamic changes of user requirements. Autonomous Decentralized Community System (ADCS) has been proposed to realize a system that satisfies such requirements. The system realizes flexibility to cope with dynamic changes in the environment, but since ADCS is a pure decentralized system, the system has no existence that monitors the whole system to maintain timeliness, which is an essential factor for assurance. In this paper, Autonomous Construction Technology is proposed to improve the response time, which integrates and divides a community in order to achieve the optimal size depending on the changes in environment. The effectiveness is verified through simulation.

Kotaro Hama, Yuji Horikoshi, Yosuke Sugiyama, Kinji Mori
Preventing Denial-of-Service Attacks in Shared CMP Caches

Denial-of-Service (DoS) attacks try to exhaust some shared resources (e.g. process tables, functional units) of a service-centric provider. As Chip Multi-Processors (CMPs) are becoming mainstream architecture for server class processors, the need to manage on-chip resources in a way that can provide QoS guarantees becomes a necessity. Shared resources in CMPs typically include L2 cache memory. In this paper, we explore the problem of managing the on-chip shared caches in a CMP workstation where malicious threads or just cache “hungry” threads try to hog the cache giving rise to DoS opportunities. An important characteristic of our method is that there is no need to distinguish between malicious and “healthy” threads. The proposed methodology is based on a statistical model of a shared cache that can be fed with run-time information and accurately describe the behavior of the shared threads. Using this information, we are able to understand which thread (malicious or not) can be “compressed” into less space with negligible damage and to drive accordingly the underlying replacement policy of the cache. Our results show that the proposed attack-resistant replacement algorithm can be used to enforce high-level policies such as policies that try to maximize the “usefulness” of the cache real estate or assign custom space-allocation policies based on external QoS needs.

Georgios Keramidas, Pavlos Petoumenos, Stefanos Kaxiras, Alexandros Antonopoulos, Dimitrios Serpanos

Architectures and Implementations

A Method for Router Table Compression for Application Specific Routing in Mesh Topology NoC Architectures

One way to specialize a general purpose multi-core chip built using NoC principles is to provide a mechanism to configure an application specific deadlock free routing algorithm in the underlying communication network. A table in every router, implemented using a writable memory, can provide a possibility of specializing the routing algorithm according to the application requirements. In such an implementation the cost (area) of the router will be proportional to the size of the routing table. In this paper, we propose a method to compress the routing table to reduce its size such that the resulting routing algorithm remains deadlock free as well as has high adaptivity. We demonstrate through simulation based evaluation that our application specific routing algorithm gives much higher performance, in terms of latency and throughput, as compared to general purpose algorithms for deadlock free routing. We also show that a table size of two entries for each output port gives performance within 3% of the uncompressed table.

Maurizio Palesi, Shashi Kumar, Rickard Holsmark
Real-Time Embedded System for Rear-View Mirror Overtaking Car Monitoring

The main goal of an overtaking monitor system is the segmentation and tracking of the overtaking vehicle. This application can be addressed through an optic flow driven scheme. We can focus on the rear mirror visual field by placing a camera on the top of it. If we drive a car, the ego-motion optic flow pattern is more or less unidirectional, i.e. all the static objects and landmarks move backwards while the overtaking cars move forward towards our vehicle. This well structured motion scenario facilitates the segmentation of regular motion patterns that correspond to the overtaking vehicle. Our approach is based on two main processing stages: first, the computation of optical flow using a novel superpipelined and fully parallelized architecture capable to extract the motion information with a frame-rate up to 148 frames per second at VGA resolution (640x480 pixels). Second, a tracking stage based on motion pattern analysis provides an estimated position of the overtaking car. We analyze the system performance, resources and show some promising results using a bank of overtaking car sequences.

Javier Díaz, Eduardo Ros, Sonia Mota, Rodrigo Agis
Design of Asynchronous Embedded Processor with New Ternary Data Encoding Scheme

This paper presents a low-power implementation of the asynchronous 8051 processor, called A8051 and it employs a new data encoding method, RT/NRT encoding, to reduce switching activities. The paper focuses on power analysis of the proposed data encoding based on the experimental design of A8051. The proposed data encoding method is devised to meet the DI assumption using Ternary logic. This method reduces not only the number of wires but also the switching activities. In terms of switching activities, the proposed ternary encoding can reduce 26% comparing to conventional ternary encoding. A8051 using RT/NRT encoding shows 24% higher instruction per energy metric comparing to A8051 using dual-rail encoding.

Je-Hoon Lee, Eun-Ju Choi, Kyoung-Rok Cho
Hardware-Based IP Lookup Using n-Way Set Associative Memory and LPM Comparator

IP lookup process becomes the bottleneck of packet transmission as IP traffic increases. Hardware-based IP lookup is desirable for high-speed router. However, the IP lookup schemes using an index-based table are not efficient due to heavy prefix expansion. In this paper, efficient hardware-based IP lookup schemes using

n

-way set associative memory and a LPM comparator is proposed. It reduces memory requirements to about 50% or below compared with previous scheme and provides faster updating speed. It also completes an IP routing lookup with two memory accesses.

SangKyun Yun
A Flash File System to Support Fast Mounting for NAND Flash Memory Based Embedded Systems

In embedded systems, NAND flash memory is typically used as a storage medium because of its non-volatility, fast access time and solid-state shock resistance. However, it suffers from out-place-update, limited erase cycles and page based read/write operations. Flash file systems such as JFFS2 and YAFFS, allocate memory spaces using LFS (Log-structured File System) to solve these problems. Because of this, many pieces of a file are scattered through out flash memory. Therefore, these file systems should scan entire flash memory to construct the data structures during the mounting. This means that it takes a long time to mount such file systems on a large chip. In this paper, we design and propose a new flash memory file system which targets mobile devices that require fast mounting. We experimented on the file system performance and the results show that we improve the mounting time by 64%–76% as flash usage compared to YAFFS.

Song-Hwa Park, Tae-Hoon Lee, Ki-Dong Chung
Rescheduling for Optimized SHA-1 Calculation

This paper proposes the rescheduling of the SHA-1 hash function operations on hardware implementations. The proposal is mapped on the Xilinx Virtex II Pro technology. The proposed rescheduling allows for a manipulation of the critical path in the SHA-1 function computation, facilitating the implementation of a more parallelized structure without an increase on the required hardware resources. Two cores have been developed, one that uses a constant initialization vector and a second one that allows for different Initialization Vectors (

IV

), in order to be used in HMAC and in the processing of fragmented messages. A hybrid software/hardware implementation is also proposed. Experimental results indicate a throughput of 1.4 Gbits/s requiring only 533 slices for a constant

IV

and 596 for an imputable

IV

. Comparisons to SHA-1 related art suggest improvements of the throughput/slice metric of 29% against the most recent commercial cores and 59% to the current academia proposals.

Ricardo Chaves, Georgi Kuzmanov, Leonel Sousa, Stamatis Vassiliadis
Software Implementation of WiMAX on the Sandbridge SandBlaster Platform

This paper describes a Sandbridge Sandblaster system implementation including both hardware and software elements for a WiMAX 802.16e system. The system is implemented on the fully functional multithreaded Sandblaster multiprocessor SB3010 SoC chip. The entire communication protocol, physical layer and MAC, has been implemented in software using pure ANSI C programming language and it executes in real time. In this paper, we also present a radio propagation analysis specific to the Samos island at the workshop location, and the DSP execution performance.

Daniel Iancu, Hua Ye, Emanoil Surducan, Murugappan Senthilvelan, John Glossner, Vasile Surducan, Vladimir Kotlyar, Andrei Iancu, Gary Nacer, Jarmo Takala
High-Radix Addition and Multiplication in the Electron Counting Paradigm Using Single Electron Tunneling Technology

The Electron Counting (EC) paradigm was proved to be an efficient methodology for computing arithmetic operations in Single Electron Tunneling (SET) technology. In previous research EC based addition and multiplication have been implemented. However, the effective performance of these schemes is diminished by fabrication technology imposed practical limitations. To alleviate this problem high radix computation was suggested. In this paper we present a high radix EC addition scheme and a high radix EC multiplication scheme. For both arithmetic operations, we first briefly present the normal (non high radix) EC schemes. Second, we present the high radix schemes and explain their functionality. Third, we explain the implementation of the high radix schemes in details. Finally, we present simulation results and evaluate the schemes in terms of delay and area cost.

Cor Meenderinck, Sorin Cotofana
Area, Delay, and Power Characteristics of Standard-Cell Implementations of the AES S-Box

Cryptographic substitution boxes (S-boxes) are an integral part of modern block ciphers like the Advanced Encryption Standard (AES). There exists a rich literature devoted to the efficient implementation of cryptographic S-boxes, whereby hardware designs for FPGAs and standard cells received particular attention. In this paper we present a comprehensive study of different standard-cell implementations of the AES S-box with respect to timing (i.e. critical path), silicon area, power consumption, and combinations of these cost metrics. We examined implementations which exploit the mathematical properties of the AES S-box, constructions based on hardware look-up tables, and dedicated low-power solutions. Our results show that the timing, area, and power properties of the different S-box realizations can vary by more than an order of magnitude. In terms of area and area-delay product, the best choice are implementations which calculate the S-box output. On the other hand, the hardware look-up solutions are characterized by the shortest critical path. The dedicated low-power implementations do not only reduce power consumption by a large degree, but they also show good timing properties and offer the best power-delay and power-area product, respectively.

Stefan Tillich, Martin Feldhofer, Johann Großschädl

Embedded Sensor Systems

Integrated Microsystems in Industrial Applications

Since the 1960s etching of silicon has been used to make three-dimensional structures. The first devices were pressure sensors using a thin silicon membrane. More recently accelerometers and gyroscopes have been developed. All of these devices can be integrated with electronics enabling the introduction of extra functions such as self-test and self-calibration. A broader look at sensors shows a wealth of integrated devices. The critical issues are reliability and packaging if these devices are to find the applications. A number of silicon sensors have shown great commercial success. This paper will give a brief overview of the technologies and some examples of applications.

Paddy J. French
A Solid-State 2-D Wind Sensor

This paper describes the industrial realization of a solid-state wind sensor, that is, one without moving parts. The key component of the sensor is a heated silicon chip that is packaged in such a way that it is non-uniformly cooled by the wind. The resulting flow-induced temperature gradient is measured by on-chip temperature sensors. Their output is then digitized and processed by a microprocessor in order to determine both wind speed and direction. For wind speeds between 0.1 and 25m/s, the errors in the computed wind speed and direction are less than 0.5m/s (or±3%) and ±3° respectively.

K. A. A. Makinwa, Johan H. Huijsing, Arend Hagedoorn
Fault-Tolerant Bus System for Airbag Sensors and Actuators

In order to satisfy the increasing safety requirements for airbag deployment systems in cars, the number of airbag actuators and sensors increases steadily. It is important to keep the complexity of the system manageable, for example by replacing the current point-to-point systems by a networked system. This paper gives an overview of such a system and discusses some of the interesting implementation details.

Klaas-Jan de Langen
Backmatter
Metadaten
Titel
Embedded Computer Systems: Architectures, Modeling, and Simulation
herausgegeben von
Stamatis Vassiliadis
Stephan Wong
Timo D. Hämäläinen
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-36411-5
Print ISBN
978-3-540-36410-8
DOI
https://doi.org/10.1007/11796435

Premium Partner