Keynote Speech

Are Lessons Learnt in Mobile Ad Hoc Networks Useful for Wireless Sensor Networks?

Many researchers consider wireless sensor networks (WSNs) as special case of Mobile Ad Hoc Networks (MANETs). Although WSNs do share some similarities with MANETs, WSNs are very different from MANETs and have many unique research issues. I argue that lessons learnt from research in MANETs are of little use when studying WSNs. This talk will address the major differences between MANETs and WSNs. The focus of this talk will be on new challenging research issues in WSNs, such as ID management, adaptive route – an exciting research area.

Lionel Ni

Compiler-Directed Scratchpad Memory Management

On-chip memory, in the form of (hardware-managed) cache, (software-managed) scratchpad memory (SPM) or some combination of both, is widely used in embedded systems. Most high-end embedded systems have both cache and SPM on-chip since each addresses a different need. Caches allow easy integration and are often effective but are unpredictable. SPMs are more energy-efficient than caches since they do not need complex tag-decoding logic. In addition, SPMs provide absolutely predictable performance but the programmer or compiler must schedule explicit data/instruction transfers between the SPM and off-chip main memory in an embedded system. In today’s industry, this task is largely accomplished manually. The programmer often spends a lot of time on partitioning data and/or instructions and inserting explicit data/instruction transfers required between the SPM and main memory. Such a manual approach is time-consuming and error-prone. Obtaining satisfactory solutions for large application programs by hand can be challenging. Furthermore, hand-crafted code is not portable since it is usually customised for one particular architecture.

This talk introduces a compiler approach, called

memory coloring

, that we have recently developed to automatically allocating the arrays in a program to an SPM. The arrays are frequently used in embedded applications such as image processing and signal processing. The novelty of this approach lies in partitioning an SPM into a pseudo register file, splitting the live ranges of arrays to create potential data transfer statements between the SPM and off-chip main memory, and finally, adapting an existing graph-colouring algorithm for register allocation to assign the arrays in the program into the register file. This compiler-directed approach is efficient due to the practical efficiency of graph-colouring algorithms. We have implemented this work in the SUIF/machSUIF compiler framework. Preliminary results over benchmarks show that this compiler-directed approach represents a promising solution to automatic SPM management.

Jingling Xue

Heterogeneous Multi-processor SoC: An Emerging Paradigm of Embedded System Design and Its Challenges

The recent years have witnessed a variety of new embedded applications. Typical examples include mobile multimedia gadgets, digital televisions, high-end cell phones, wireless network applications, etc. The salient features of these applications include more comprehensive functionalities, higher performance demand, and low-power consumption. These requirements render the traditional single processor-based embedded systems no longer an appropriate realization for such applications. On the other hand, the continual advance of VLSI technologies enables more and more transistors to be integrated on a single chip. The International Technology Roadmap for Semiconductors predicts that chips with a billion transistors are within reach. As a result, the push (application demands) and pull (VLSI technology) forces together give birth to the multi-processor system-on-chips (MPSoCs).

Heterogeneous MPSoCs are different from traditional embedded systems in many aspects and they ask for new design and implementation methodologies. Heterogeneous MPSoCs are not merely a hardware design. The complexity and heterogeneity of the system significantly increase the complexity of the HW/SW partitioning problem. Meanwhile, evaluating the performance and verifying its correctness is much more difficult compared to traditional single processor-based embedded systems. Constructing a simulator to simulate the system’s behavior and evaluate its performance takes more effort compared to conventional embedded systems. The verification of the system also becomes challenging.

Programming a heterogeneous MPSoC is another challenge to be faced. This problem arises simply because there are multiple programmable processing elements and since they are heterogeneous, software designer needs to have expertise on all of these processing elements and needs to take a lot of care on how to make the software running as a whole.

There are a lot more issues that do not appear or easier to tackle on traditional embedded systems, trade-offs between performance and low-power will dominate the design life time. However, the incoming challenges also brought us many opportunities either to industry and academic research.

Xu Cheng

Track 1: Embedded Hardware

Trace-Based Runtime Instruction Rescheduling for Architecture Extension

The update of embedded processor may introduce new function unit, new coprocessor, or even new additional DSP. In many cases, software application can’t be fully rebuilt to utilize these new resources. This paper describes a novel framework, called Runtime Instruction Rescheduling (RIR), to solve this problem. RIR can find hot spots in binary codes, build a large instruction window to generate trace, reschedule and optimize instructions in traces. Different scheduling policies have been simulated. Shown from detailed simulation, RIR helps the old binary codes benefit from new hardware resources.

YuXing Tang, Kun Deng, HongJia Cao, XingMing Zhou

Bioinformatics on Embedded Systems: A Case Study of Computational Biology Applications on VLIW Architecture

Bioinformatics applications represent the increasingly important workloads. Their characteristics and implications on the underlying hardware design, however, are largely unknown. Currently, biological data processing ubiquitously relies on the high-end systems equipped with expensive, general-purpose processors. The future generation of bioinformatics requires the more flexible and cost-effective computing platforms to meet its rapidly growing market. The programmable, application-specific embedded systems appear to be an attractive solution in terms of easy of programming, design cost, power, portability and time-to-market. The first step towards such systems is to characterize bioinformatics applications on the target architecture. Such studies can help in understanding the design issues and the trade-offs in specializing hardware and software systems to meet the needs of bioinformatics market. This paper evaluates several representative bioinformatics tools on the VLIW based embedded systems. We investigate the basic characteristics of the benchmarks, impact of function units, the efficiency of VLIW execution, cache behavior and the impact of compiler optimizations. The architectural implications observed from this study can be applied to the design optimizations. To the best of our knowledge, this is one of the first such studies that have ever been attempted.

Yue Li, Tao Li

The Design Space of CMP vs. SMT for High Performance Embedded Processor

In embedded world, many researchers have begun to examine Simultaneous Multithreading (SMT) and Chip Multiprocessing (CMP) for various demands. SMT and CMP both make a chip to achieve greater throughput. But the power, chip size and thermal features are also important for embedded system. In this paper we compare the design space of both architecture. As simulation results shown, although extending wide-issue processor into SMT has the advantage of small design changes, high hardware resource efficiency and high throughput, CMP presents better scalability in raw performance and power metric under heavy multithreaded workload than SMP. CMP integrates several similar processor in a single chip, so it can’t uses the chip area efficiently like SMT. And the chip area limits will prevent the CMP from equipping a large L2 cache, which will hurt the performance of memory-bound application. The evaluation also points out the design problem and possible solution for power, chip size and thermal efficiency in CMP and SMT.

YuXing Tang, Kun Deng, XingMing Zhou

Reconfigurable Microarchitecture Based System-Level Dynamic Power Management SoC Platform

Power-aware design is one of the most important areas to be emphasized in multimedia mobile systems, in which data transfers dominate the power consumption. In this paper, we propose a new architecture for motion compensation (MC) of H.264/AVC with power reduction by decreasing the data transfers. For this purpose, a reconfigurable microarchitecture based on data type is proposed for interpolation and it is mapped onto the dedicated motion compensation IP (intellectual property) effectively without sacrificing the performance or the system latency. The original quarter-pel interpolation equation that consists of one or two half-pel interpolations and one averaging operation is designed to have different execution control modes, which result in decreasing memory accesses greatly and maintaining the system efficiency. The simulation result shows that the proposed method could reduce up to 87% of power caused by data transfers over the conventional method in MC module.

Cheong-Ghil Kim, Dae-Young Jeong, Byung-Gil Kim, Shin-Dug Kim

Track 2: Embedded Software

A Methodology for Software Synthesis of Embedded Real-Time Systems Based on TPN and LSC

This paper shows an approach for software synthesis in embedded hard real-time systems starting from Live Sequence Charts (LSC) scenarios as specification language. As the name suggests, LSCs specify liveness, that is, things that must happen. Therefore allowing the distinction between possible and necessary behavior as well as the specification of possible anti-scenarios. Embedded software has become much harder to design due to the diversity of requirements and high complexity. In such systems, correctness and timeliness verification is an issue to be concerned. The software synthesis method takes a specification (in this case composed by LSC scenarios) and automatically generates a program source code where: (i) functionalities and constraints are satisfied; and (ii) operational support for task’s execution is provided. This paper adopts a time Petri net (TPN) formalism for system modeling in order to find feasible pre-runtime schedules, and for synthesizing predictable and timely scheduled code. Embedded software synthesis has been receiving much attention. However, few works deal with software synthesis for hard real-time systems considering arbitrary precedence and exclusion relations.

Leonardo Amorim, Raimundo Barreto, Paulo Maciel, Eduardo Tavares, Meuse Oliveira Jr, Arthur Bessa, Ricardo Lima

Ahead of Time Deployment in ROM of a Java-OS

This article shows how it is possible to place a great part of a Java system in read-only memory in order to fit with the requirements of tiny devices. Java systems for such devices are commonly deployed off-board, then embedded on the target device in a ready-to-run form. Our approach is to go as far as possible in this deployment, in order to maximize the amount of data placed in read-only memory. Doing so, we are also able to reduce the overall size of the system.

Kevin Marquet, Alexandre Courbot, Gilles Grimaud

The Research on How to Reduce the Number of EEPROM Writing to Improve Speed of Java Card

Java Card technology enables smart cards and other devices with very limited memory to run small applications, called applets. It provides users with a secure and interoperable execution platform that can store and update multiple applications on a single device. This Java Card technology is now a mature and accepted standard smart card standards and SIM technology. However, the main concern of Java Card is now its low execution speed caused by the hardware limitation. In this paper, we propose several ideas about how to improve an execution speed of Java Card. The key idea of our approach is that an EEPROM writing operation is more expensive than that of RAM. We suggest how to use RAM as much as possible; Transaction_In_RAM(TrIR), Resolution_In_RAM and Java Object-buffer.

Min-Sik Jin, Won-Ho Choi, Yoon-Sim Yang, Min-Soo Jung

A Packet Property-Based Task Scheduling Policy for Control Plane OS in NP-Based Applications

In NP-based networking elements, there are various kinds of packet traffic between data plane and control plane, which have different priorities and are handled by different tasks running on control plane OS. The critical packets need to be processed in time, otherwise the system, even the network, may enter some unstable states. Thus, the packets should be processed according to their priorities, i.e., packet-processing tasks for more important packets should be executed sooner if they are both in ready state. From the perspective of control plane OS design, packet-processing tasks should be scheduled based on some properties of packets waiting to be processed. This paper proposes a packet property-based task scheduling policy to alleviate the problem. The design and implementation are described and the performance results are discussed. The results show that this scheduling policy can achieve our design goal properly.

Shoumeng Yan, Xingshe Zhou, Fan Zhang, Yaping Wang

RBLS: A Role Based Context Storage Scheme for Sensornet

To addressing the self adaptation problem arises from large scale densely deployed sensornet, we argue that integrating the principle of context aware with sensornet is feasible. To build such a context aware sensornet, proper context describing and storing mechanisms must be provided. In this paper, we propose RBLS, a Role Based Local Storage scheme. RBLS is designed simple and energy efficient. Aimed at providing context storage support for sensornet, RBLS stores contexts at node’s local space and dynamically allocates extra spaces according to the roles a node holding. A “snapshot” is used by RBLS to record a neighbor’s private contexts. We evaluate the performance of RBLS against a primitive scheme. Simulation results are included in this paper.

Qin Huaifeng, Zhou Xingshe

CDP: Component Development Platform for Communication Protocols

Complexity of software systems has significantly grown with social dependence on computer system, especially for mobile and internet. So we present component-based communication protocol architecture. In this architecture, Component Development Platform (CDP) is the kernel software. CDP is one of the rapid communication components’ development tools. It is the collection of views and plug-ins to form a series of tools and it takes much flexibility and configuration ability. For the customization of component-based communication protocols, code analysis tools make a good abstract from original practical source codes to visual structure model too. This becomes the feasible guidance and roadmap to develop the components and component-based communication protocols.

Hong-Jun Dai, Tian-Zhou Chen, Chun Chen, Jiang-Wei Huang

TrieC: A High-Speed IPv6 Lookup with Fast Updates Using Network Processor

Address lookup is one of the main bottlenecks in Internet backbone routers, as it requires the router to perform a longest-prefix-match when searching the routing table for a next hop. Ever-increasing Internet bandwidth, continuously growing prefix table size and inevitable migration to IPv6 address architecture further exacerbate this situation. In recent years, a variety of high-speed address lookup algorithms have been proposed, however most of them are inappropriate to IPv6 lookup. This paper proposes a high-speed IPv6 lookup algorithm TrieC, which achieves the goals of high-speed address lookup, fast incremental prefix updates, high scalability and reasonable memory requirement by taking great advantage of the network processor architecture. Performance of TrieC is carefully evaluated with several IPv6 routing tables of different sizes and different prefix length distributions on Intel IXP2800 network processor(NPU). Simulation shows that TrieC can support IPv6 lookup at OC-192 line rate. Furthermore, if TrieC is pipelined in hardware, it can achieve one IPv6 lookup per memory access.

Xianghui Hu, Bei Hua, Xinan Tang

Separate Compilation for Synchronous Modules

Synchronous models are useful for designing real-time embedded systems because they provide timing control and deterministic concurrency. However, the semantics of such models usually require an entire system to be compiled at once to analyze the dependencies among modules. The alternative is to write modules that can respond when the values of some of their inputs are unknown, a tedious and error-prone process.

We present a compilation technique that allows a programmer to describe synchronous modules without having to consider undefined inputs. Our algorithm transforms such a description into code that does as much as it can with undefined inputs, allowing modules to be compiled separately and assembled later.

We implemented our technique in a compiler for the Esterel language and present results that show the technique does not impose a substantial overhead.

Jia Zeng, Stephen A. Edwards

Implementation of Hardware and Embedded Software for Stream Gateway Interface Supporting Media Stream Transmissions with Heterogeneous Home Networks

In modern information society, we are confronted by an alteration paradigm called the information technology (IT) revolution. Recently, along with advances in various computing technologies, new concepts, such as a convergence phenomenon among information devices, have given rise to ubiquitous and pervasive computing around the world. Also, the realization of the home network, which is one of the core solutions for future computing, has become an important issue. This paper describes the architecture of the next generation home network interface hardware, which will support various home networks and media services through a media stream transmission in heterogeneous home networks. For the transmission of media streams, such as MPEG-2 TS and DVD, we designed and implemented hardware and embedded software for the stream gateway interface which supports media stream transmission with heterogeneous home networks.

Young-choong Park, Seung-ok Lim, Kwang-sun Choi, Kawng-mo Jung, Dongil Shin

Track 3: Real-Time Systems

On Using Locking Caches in Embedded Real-Time Systems

Cache memories are crucial to obtain high performance on contemporary processors. However, they have been traditionally avoided in embedded real-time systems due to their lack of determinism. Unfortunately, most of the techniques to attain predictability on caches are complex to apply, precluding their use on real applications. This work reviews several techniques developed by the authors to use cache memories in “real” embedded real-time systems, with the ease of use in mind. Those techniques are based on a locking cache, which offers a very predictable behaviour. Both static and dynamic use are proposed as well as the algorithms and methods required to make the schedulability analysis using two different scheduling policies. Also proposed is a genetic algorithm that finds, within acceptable computational cost, the sub-optimal set of instructions that must be preloaded in cache. Finally, a set of statistical analyses compares the locking cache versus a conventional one.

A. Martí Campoy, E. Tamura, S. Sáez, F. Rodríguez, J. V. Busquets-Mataix

Trace Acquirement from Real-Time Systems Based on WCET Analysis

Embedded real-time systems often operate under strict timing constraints. In order to test a real-time system thoroughly, we should instrument the system under test with assertions. Thus, the timing behaviors of such a system will change more or less. In this paper, we present two methods to weaken or even remove the timing related impact of the inserted assertions. Firstly, a new monitoring schema is presented which has less time intrusive than software motoring and can test the target system completely. This schema is a mixture of hardware monitoring and software monitoring. Secondly, in order to weaken the time intrusiveness of assertions as much as possible, we present a WCET analysis based time correction method. This method can compute the accurate execution time of assertions and corrects the recorded time of interested events.

Ji Meng-Luo, Wang Xin, Qi Zhi-Chang

Elimination of Non-deterministic Delays in a Real-Time Database System

In a real-time database system, the conventional method of transaction method can not be used. In these methods, the deadlock detection is based on (a) use of delay to cause and watch deadlocks, (b) high overheads of periodic checking (c) Non-deterministic nature of the delays, and lastly, (d) difficulties to scale up the existing solutions (centralized). The proposal is based on enhanced local processing and peer-to-peer (P2P) communication for distributed transaction process. The earlier procedures incorporate additional steps for handling wait-for states and deadlocks. This activity is carried out by methods based on wait-for-graphs or probes. These methods introduce a centralized computation at source (for each occurrence of a delay). The proposal introduces asynchronous operations in transaction processing. As a result the detection processes do not wait for occurrences of delays (time-out). These start the delay elimination process instantaneously. The technique incurs low overheads and eliminates the possibility of occurrence of waiting.

Masaki Hasegawa, Subhash Bhalla, Laurence Tianruo Yang

Solving Real-Time Scheduling Problems with Model-Checking

Real-time scheduling is a well-studied field with mature techniques such as Rate Monotonic Analysis. In this paper, we investigate an alternative approach to solving real-time scheduling problems with model-checking. We use the modeling formalism Hybrid Automata and the model-checker HyTech for this purpose, and illustrate advantages and limitations of this approach as compared to the conventional real-time scheduling techniques. In particular, we can use model-checking for analysis of best-case response time of tasks in addition to the worst-case response time, and we can take advantage of HyTech’s parametric analysis capability to derive task parameters such as the critical scaling factor.

Zonghua Gu

Efficient FPGA Implementation of a Knowledge-Based Automatic Speech Classifier

Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of Automatic Speech Recognition (ASR) systems are comparable to Human Speech Recognition (HSR) only under very strict working conditions, and in general far lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to rise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper an optimized digital Knowledge-based Automatic Speech Classifier for real-time applications is implemented on FPGA using six attribute scoring Multi-Layer Perceptrons (MLP). Digital MLP key features are a virtual neuron architecture and use of sinusoidal activation functions for the hidden layer. Implementation results on FPGA show that use of sinusoidal activation functions decrease hardware resource usage of more than 50% for slices, FFs, LUTs and more than 35% for FPGA RAM blocks when compared with the standard sigmoid-based neuron implementation. Furthermore, neuron virtualization allows for a significant decrease of concurrent memory access, resulting in improved performance for the entire attribute scoring module.

Sabato M. Siniscalchi, Fulvio Gennaro, Salvatore Vitabile, Antonio Gentile, Filippo Sorbello

Track 4: Power-Aware Computing

A Topology Control Method for Multi-path Wireless Sensor Networks

Future sensor networks will be consisted of large number of untethered and unattended sensors. Energy efficiency and load balance will be two important design issues for these networks. Some researches [2] [3] have found that topology control, which changes the set of neighbors of some nodes in the networks, can be used to improve the energy efficiency and load balance. In this paper, we take link reliability and multi-path into consideration when designing topology control algorithms. Based on an analytical link loss model [8], the relationship of energy efficiency, load balance and number of neighbors is analyzed. We found that there is a contradiction in improving energy efficiency and load balance at the same time. A layered topology control method LELBM (Layered Energy-efficient and Load Balance Method) is proposed to adjusting network’s topology for increasing energy efficiency and meanwhile getting good load balance. Analysis and simulation show that this method can significantly improve the network’s performance.

Zhendong Wu, Shanping Li, Jian Xu

Dynamic Threshold Scheme Used in Directed Diffusion

Since sensor nodes are generally constrained in on-board energy supply, saving energy is crucial in extending the lifetime of the wireless sensor networks. In this paper, we present a dynamic threshold scheme used in the data propagation phase of Directed Diffusion, which can decrease the traffic between nodes and therefore prolongs the longevity of wireless sensor networks. The sensor node sends data to the sink only when the sampling value reaches the hard threshold and the difference between the current sample value and the last sent value is greater than the soft threshold, which reduces data traffic to conserve energy. The soft threshold will be adjusted by using the historical values to make the proportion of sending due to timeout equal to the expected value. Whether the values of a sensed attribute change drastically or not, the data sent will continue at a reasonable rate, which adapts the sensor network to the changeful circumstance. Simulation results show that the improved Directed Diffusion based on dynamic threshold can prolong the network lifetime about 35% compared with the original system and can reacting immediately to drastic change in the values of a sensed attribute.

Ning Hu, Deyun Zhang, Fubao Wang

Compiler-Directed Energy-Aware Prefetching Optimization for Embedded Applications

High energy consumption has become a limiting factor for battery-operated embedded systems. Most low-power compiler optimization techniques take the approach of minimizing the energy consumption while meeting small performance loss. In addition, it is possible that the available

energy budget

is not sufficient to meet the optimal performance objective. In such situation, energy-constrained optimization is more significant. In this paper, we explore two kinds of energy-aware prefetching optimizations: prefetching optimization with minimizing energy consumption and energy-constrained prefetching optimization. We exploit energy saving opportunities through reducing memory stalls and CPU stalls caused by too early or too late prefetching. We build models for these two kinds of energy-aware prefetching optimization approaches and use a group of array-dominated applications to validate our approach.

Juan Chen, Yong Dong, Huizhan Yi, Xuejun Yang

A Dynamic Energy Conservation Scheme for Clusters in Computing Centers

HPC clusters are widely used to execute parallel tasks. With the increasing number of nodes and frequency of processors, they consume huge amount of energy. The heat generated by clusters also imposes very heavy load for cooling infrastructures. The utilization of some clusters is not always high, indicating that there is a huge space to conserve energy consumption with more intelligent energy management scheme. Although there has been some energy conservation schemes proposed for web clusters, they are not applicable to HPC clusters. In this paper we propose a dynamic energy conservation scheme for HPC clusters. The scheme is to turn some cluster nodes on and off dynamically according to the current and historical workload. The goal is to reduce the energy consumption of clusters with minimal performance loss. We evaluate our scheme by simulation and show that it can effectively conserve energy consumption.

Wenguang Chen, Feiyun Jiang, Weimin Zheng, Peinan Zhang

Track 5: Hardware/Software Co-design and System-On-Chip

Realization of Video Object Plane Decoder on On-Chip Network Architecture

System-on-chip (SoC) designs provide integrated solutions to challenging design problems in the telecommunications, multimedia, and so on. Present and future SoC are designed using pre-existing components which we call cores. Communication between the cores will become a major bottleneck for system performance as standard hardwired bus-based communication architectures will be inefficient in terms of throughput, latency and power consumption. To solve this problem, a packet switched platform that considers the delay and reliability issues of wires so called Network-on-Chip (NoC) has been proposed. In this paper, we present interconnected network topologies and analyze their performances with a particular application under bandwidth constrains. Then we compare the performances among different ways of mapping the cores onto a Mesh NoC architecture. The comparison between Mesh and Fat-Tree topology is also presented. These evaluations are done by utilizing NS-2, a tool that has been widely used in the computer network design.

Huy-Nam Nguyen, Vu-Duc Ngo, Hae-Wook Choi

Network on Chip for Parallel DSP Architectures

Network-on-Chip is a new methodology of System-on-Chip design. It can be used to improve communication performance among many computing nodes of parallel DSP architectures. Simulations based on the 16-node 2D-mesh DragonFly DSP architecture show that the routing distance of 72.9% inter-node communication is 1. A fast local router is proposed to improve the performance of this communication. Experiments on our simulator show that overall inter-node communication delay is decreased by 59.4%.

Yuanli Jing, Xiaoya Fan, Deyuan Gao, Jian Hu

A New Methodology of Integrating High Level Synthesis and Floorplan for SoC Design

As silicon CMOS technology is scaled into the nanometer regime, a whole system can be integrated into one chip. At the same time, the computer-aided design technology is challenged by two major features: the ever-increasing design complexity of gigascale integration and complicated physical effects inherent from the nanoscale technology. In this paper, a new methodology of integrating High Level Synthesis and Floorplan together is presented. The whole design flow is divided into two phases: a fast searching space scan procedure and a detailed solution optimize procedure. The searching space of integrating HLS and Floorplan is first “smoothed” by a “Behavior Information based Cluster Algorithm”, and then a fast scan of this smoothed searching space is proceeded. The result of the first phrase will be used as the start point of the detailed optimize procedure. The experimental result show that the methodology is efficient.

Yunfeng Wang, Jinian Bian, Xianlong Hong, Liu Yang, Qiang Zhou, Qiang Wu

Designing On-Chip Network Based on Optimal Latency Criteria

A new chip design paradigm, so called Network on Chip, has been introduced based on the demand of integration of many heterogeneous semiconductor intellectual property (IP) blocks. The Network on Chip design not only requires the good performance but also the minimization of several physical constraints such as the network latency, the used area as well as the power consumption of design. This paper analyzes the average latency of heterogeneous Network on Chip architectures in which the shortest path routing algorithm is applied. This average latency includes the queuing latency and the wire latency, and is calculated for general cases of allocating IPs onto the fixed generic switching architectures such as 2-D Mesh and Fat-Tree. With different allocation schemes of IPs, the network has different average latencies. Hence, this article presents an optimal search that adopts the Branch and Bound algorithm to find out the optimal mapping scheme to achieve the minimal network latency. This algorithm automatically map the desired IPs onto the target Network on Chip architecture with the criteria of lowest network latency. Some experiments of On Chip Multiprocessor Network application are simulated. The results show that the network latency is significantly saved with the optimized allocation scheme for the several cases of generic architectures of On Chip Multiprocessor Network application.

Vu-Duc Ngo, Huy-Nam Nguyen, Hae-Wook Choi

Track 6: Testing and Verification

Microprocessor Based Self Schedule and Parallel BIST for System-On-a-Chip

The purpose of this paper is to develop a flexible test method with high efficiency for core-based system-on-a-chip (SOC). The novel feature of the approach is the use of an embedded microprocessor/memory pair to test the remaining components of SOCs. The characteristics are: (1) Several IP cores can be tested in parallel; (2) The order of test tasks need not to be queued during test integration, but scheduled by test program. It is called microprocessor based self schedule and parallel BIST for SOC (MBSSP-BIST). By analyzing the bandwidth of test data, the feasibility of MBSSP-BIST is proved. Finally, several SOCs in ITC’02 benchmark are used to verify the approach and the results show that MBSSP-BIST can improve test efficiency significantly.

Danghui Wang, Xiaoya Fan, Deyuan Gao, Shengbing Zhang, Jianfeng An

Self-correction of FPGA-Based Control Units

This paper presents a self-correcting control unit design using Hamming codes for finite state machine (FSM) state encoding. The adopted technique can correct single-bit errors and detect two-bit errors in the FSM register within the same clock cycle. The main contribution is the development of a parameterizable VHDL package and the respective error-correcting modules, which can easily be added to an FSM specification using any state assignment strategy and having any number of inputs, outputs and states. Besides of application to FSM error correction, the developed tools can easily be adapted to other applications where error detection and correction is required.

Iouliia Skliarova

Detecting Memory Access Errors with Flow-Sensitive Conditional Range Analysis

Accessing an out-of-bounds memory address can lead to nondeterministic behaviors or elusive crashes. Static analysis can detect memory access errors from program source codes without runtime overhead, but existing techniques are either very imprecise or exponential cost. This paper proposes a precise and effective method to detect memory access errors. Firstly, it generates a state for each statement with a flow-sensitive, inter-procedural algorithm. A state includes not only range constraints like the traditional range analysis, but also occurrence conditions of the range constraints. Secondly, it solves states of memory access statement to evaluate the sizes of accessed memory bounds. The costs of state generation and state resolution are polynomial. We have implemented a prototype of the analysis method. Applied to 7 popular programs, the prototype found 40 memory access errors with a high precision of 80%.

Yimin Xia, Jun Luo, Minxuan Zhang

Deductive Probabilistic Verification Methods of Safety, Liveness and Nonzenoness for Distributed Real-Time Systems

Recently, model-checking and probabilistic timed simulation verification methods of probabilistic timed automata have been developed. In this paper, we propose probabilistic timed transition systems by generalizing probabilistic timed automata, and propose deductive verification rules of probabilistic real-time linear temporal logic over probabilistic timed transition systems. As our proposed probabilistic timed transition system is a general computational model, we have developed general verification methods.

Satoshi Yamane

Specification and Verification Techniques of Embedded Systems Using Probabilistic Linear Hybrid Automata

We can model embedded systems as hybrid systems. Moreover, they are distributed and real-time systems. Therefore, it is important to specify and verify randomness and soft real-time properties. For the purpose of system verification, we formally define probabilistic linear hybrid automaton and its symbolic reachability analysis method. It can describe uncertainties and soft real-time characteristics. Our proposal method is the first attempt to symbolically verify probabilistic linear hybrid automata.

Yosuke Mutsuda, Takaaki Kato, Satoshi Yamane

Formalization of fFSM Model and Its Verification

PeaCE(Ptolemy extension as a Codesign Environment) was developed for the hardware and software codesign framework which allows us to express both data flow and control flow. The

f

FSM is a model for describing the control flow aspects in PeaCE, but it has difficulties in verifying their specifications due to lack of their formality. Thus we propose the formal semantics of the model based on its execution steps. To verify an

f

FSM model, it is translated into SMV input language with properties to be checked, automatically. As a result, some important bugs such as race condition, ambiguous transition, and circular transition can be formally detected in the model.

Sachoun Park, Gihwon Kwon, Soonhoi Ha

Track 7: Reconfigurable Computing

Dynamic Co-allocation of Level One Caches

The effectiveness of level one (L1) caches is of great importance to the processor performance. We have observed that programs exhibit varying demands in the L1 instruction cache (I-cache) and data cache (D-cache) during execution, and such demands are notably different across programs. We propose to co-allocate the cache ways between the I- and D-cache in responses to the program’s need on-the-fly. Resources are re-allocated based on the potential performance benefit. Using this scheme, a 32KB co-allocation L1 can gain 10% performance improvement on average, which is comparable to a 64KB traditional L1.

Lingling Jin, Wei Wu, Jun Yang, Chuanjun Zhang, Youtao Zhang

Jaguar: A Compiler Infrastructure for Java Reconfigurable Computing

In this paper, we present our compiler infrastructure, called Jaguar for Java reconfigurable computing. The Jaguar compiler translates compiled Java methods, i.e. sequence of bytecodes into Verilog synthesizable code modules with exploiting the maximum operational parallelism within applications. Our compiler infrastructure consists of two major components. One is a compiler to generate synthesizable Verilog codes from Java applications, which performs full compilation passes, such as bytecode parsing, intermediate representation (IR) construction, program analysis, optimization, and code emission. The other component is the Java Virtual Machine (JVM) which provides Java execution environment to the generated Verilog modules. The JVM closely interacts with hardware during the execution through an interrupt method. We discuss the performance issues and code transformation techniques to reduce the interaction overhead in our Java reconfigurable computing environment.

Youngsun Han, Seon Wook Kim, Chulwoo Kim

CCD Camera-Based Range Sensing with FPGA for Real-Time Processing

A depth measurement system that consists of a single camera, a laser light source and a rotating mirror is investigated. The camera and the light source are fixed at position and face a rotating mirror. The laser light is reflected by the mirror and projected to the scene for depth measurement. The camera detects the laser light location on object surfaces through the same mirror. The scan over the measured area is done by mirror rotation. FPGA is used to quickly process the collected images to generate a range image. A 136 × 240 range image can be generated in 2.3 seconds. This speed is 4 times faster than that of a PC with a 2.8GHz clock. If the frame rate can be increased to 300 per second, the speed improvement will be about 15 times. The technology, on the other hand, offers a solution for embedded implementation.

Chun-Shin Lin, Hyongsuk Kim

Track 8: Agent and Distributed Computing

Best Web Service Selection Based on the Decision Making Between QoS Criteria of Service

Recently, extensive studies have been carried out on web service standards because of the necessity of developing large-scale applications in open environments. In particular, they enable services to be dynamically bound. However, current techniques fail to address the critical problem of selecting the right service instances. Service selection should be determined based on customer preferences and service level. We propose a best web service selection method which helps to find a service provider providing the optimal quality. Web service selection process was described with multi-criteria decision making approach(e.g. PROMETHEE) on the basis of evaluated values of qualities and the defined service level. The PROMETHEE method has advantages in comparison with the others(e.g. MAUT, AHP) as follows. First, the PROMETHEE method classifies alternatives which is difficult to be compared because of a trade-off relation of evaluation standards as non-comparable alternatives. Second, the PROMETHEE method is different from the AHP method in that there’s no need to perform a pair-wise comparison again when comparative alternatives are added or deleted. Therefore, this method is a suitable approach in the web service selection problem. Because the problem has a lot of quality parameters which are measured and evaluated at the same time and frequently induces a drop of another quality parameter by the improvement of one quality attribute. Consequently, our approach enables applications to be dynamically configured at runtime in a manner that continually adapts to the preferences of the customers. We verify our approach through case study.

Young-Jun Seo, Hwa-Young Jeong, Young-Jae Song

Data Storage in Sensor Networks for Multi-dimensional Range Queries

In data-centric sensor networks, various data items, such as temperature, humidity, pressure and so on, are sensed and stored in sensor nodes. As these attributes are mostly scalar values and inter-related, multi-dimensional range queries are very useful. However, the previous work on range query processing in sensor networks did not consider overall network lifetime. To prolong network lifetime and support multi-dimensional range queries, we propose a dynamic data placement method for multi-dimensional data, where data space is divided into equal-sized regions and placed over sensor nodes in a dynamic way. Through experiments, we show the efficiency of the proposed method compared with the previous work.

Ji Yeon Lee, Yong Hun Lim, Yon Dohn Chung, Myoung Ho Kim

An OSEK COM Compliant Communication Model for Smart Vehicle Environment

Smart Vehicle Environment (SVE) is an important application of the idea of smart spaces. This paper presents Smart Vehicle Multi-Agent System (SVMAS) to achieve the goal of SVE and put forwards a commutation model for SVMAS based on SmartOSEK COM [1] to support data exchange. The paper also presents an approach to encapsulate the message to transport by CAN bus, and bring forward a simulator model for SVMAS. Finally the paper gives an example of the communication model which implements a dialogue between two agents and analyzes the performance. The contribution of our work is threefold. First, we adopt Knowledge Query and Manipulation Language (KQML) to describe the communication in vehicles. Second, we develop SmartOSEK COM to implement communication in vehicles. Third, we define the ACLcan protocol to transform the message from SmartOSEK COM to CAN frame.

Guoqing Yang, Minde Zhao, Lei Wang, Zhaohui Wu

Track 9: Wireless Communications

Resource Allocation Based on Traffic Load over Relayed Wireless Access Networks

In this paper, we develop a traffic load-based resource allocation scheme, called LOAD, to enhance the capacity of relayed wireless access networks for asymmetric traffic load such as transmission control protocol (TCP). In order to estimate the current traffic load status in relayed wireless access networks, we propose a load estimation method. A relay gateway estimates the current traffic load status by keeping track of the sizes of the frames it encounters, and computes accordingly the current traffic load of the uplink and the downlink. The results are then used to allocate the system resource between the uplink and the downlink. The proposed method can be implemented without the modification of the deployed IEEE 802.11 nodes.

We analyze the throughput ratio between the uplink and the downlink, and validate the analysis result with a comprehensive simulation study. The simulation results indicate that the utilization of the proposed method is better than that of IEEE 802.11 Distributed Coordination Function (DCF).

Sung Won Kim, Byung-Seo Kim

An Adaptive Cross Layer Unequal Protection Method for Video Transmission over Wireless Communication Channels

In this paper, a new scheme, called Adaptive Cross Layer Unequal Protection (ACLUEP), has been proposed for video transmission over wireless channels. The proposed scheme performs joint optimization across application layer, link layer and physical layer to provide unequal protection ability for video bit-streams with different priority levels. Analysis and simulation results show an extraordinary improvement in Peak Signal-to-Noise Ratio (PSNR) of the proposed method over a variety of channel conditions.

Jinbo Qiu, Guangxi Zhu, Tao Jiang

Power-Efficient Packet Scheduling Method for IEEE 802.15.3 WPAN

Power efficiency is the key issue for mobile devices, which mainly rely on limited battery power. The IEEE 802.15.3 wireless personal area network (WPAN) standard adopts a time division multiple access (TDMA) protocol controlled by a central device to support isochronous traffics. In the TDMA-based wireless packet networks, the packet scheduling algorithm plays a key role in power efficiency. However, the standard suffers from long access delay and association delay which increase the power consumption. In this paper, we propose a packet scheduling method to improve the power efficiency. Performance evaluations are carried out through simulations and significant performance enhancements are observed. Furthermore, the performance of the proposed scheme remains stable regardless of the variable system parameters such as the number of devices and superframe size.

Sung Won Kim, Byung-Seo Kim

Two Energy-Efficient, Timesaving Improvement Mechanisms of Network Reprogramming in Wireless Sensor Network

Loading program code to sensor nodes is a basic function of wireless sensor network. However, it is infeasible to gather all sensor nodes once they are deployed. Thus, network reprogramming is greatly required. In this paper, we present two improvement mechanisms to current network reprogramming approaches: the resident mechanism to reduce the size of binary code to be disseminated and stored, and the hierarchical full indexing with sliding window mechanism to record lost code capsules. Both mechanisms can reduce radio transmission and storage access, which are main consumers of energy and time. Accordingly, our mechanisms are energy-efficient and timesaving.

Bibo Wang, Yu Chen, Hongliang Gu, Jian Yang, Tan Zhao

On Location-Free Node Scheduling Scheme for Random Wireless Sensor Networks

In this paper, we have done thorough mathematical analysis and extensive simulations on the distributed, lightweight and location-free node scheduling scheme proposed in [11]. The basic idea of this scheduling scheme is to organize sensor nodes into disjoint node sets, which work alternately to extend network lifetime effectively. Distinguished from the work in [11], we reevaluate the performance of this scheduling scheme under different assumption that sensor nodes are deployed randomly in the target region according to a Poisson point process, which is a more realistic deployment model in large scale randomly deployed sensor networks. We also analyze the performance in terms of average event detection latency, which is another straightforward coverage quality measure. Our analysis results reveal the relationship among coverage quality, expected network lifetime and node deployment intensity. Impact of normally distributed time asynchrony on network coverage quality is also investigated.

Jie Jiang, Chong Liu, Guofu Wu, Wenhua Dou

Leading Causes of TCP Performance Degradation over Wireless Links

TCP is known to have performance degradation over wireless links but causes of the performance degradation have not been well studied. In order to understand the causes and to gain insight for future enhancements, we design a series of simulations to collect performance data and use stepwise multiple regression to find the leading causes. Our analysis indicates that timeout is the dominant cause of wireless TCP performance degradation. Simulations show current enhancements fail to improve the timeout behavior and thus have limited improvement. Based on these findings, we propose a new enhancement that uses ECN to deliver congestion signals and utilizes the coherence among congestion signals to distinguish wireless losses from congestion losses. Simulation results demonstrate that this enhancement thoroughly changes TCP’s timeout behavior and improves the overall performance to a new level.

Chunlei Liu

The Study and Implementation of Wireless Network Router NPU-1

Wireless network has been used widely because its convenience, agility and no cable. But we can find the critical problems for wireless LAN are communication bandwidth, reliability and security. This paper will introduce our wireless network router NPU-1. It has not only generic route functions, but it also has some special functions for wireless network such as access point, wireless bonding, PPPoW etc. Specially, in order to solve the wireless network problems of bandwidth limit and unstableness, we have put forward a self-adaptive bonding method which can solve above problems well. The key issues for this technology are bundling multiple wireless connection to improve the bandwidth and selecting better channel automatically based on monitoring the signal strength and communication quality to improve the bandwidth and stableness. This router can support wireless connection automatic recovery, automatic fail-over and it can support to connect wireless and wired LAN easily and seamlessly.

Yi’an Zhu

Track 10: Mobile Computing

Performance Evaluation of Air Indexing Schemes for Multi-attribute Data Broadcast

In this paper, we study power conservation techniques for multi-attribute queries in a wireless data broadcast environment. Most existing indexing techniques are based on a centralized tree structure and thus are inefficient for sequential-access wireless broadcast media. To conserve energy for mobile devices while maintaining acceptable data access latency, we extend the

exponential index

for single-attribute queries to multi-attribute queries. By maintaining a distributed structure and making full use of indexing space, the exponential index can reduce the energy consumption considerably. We conduct experiments to evaluate the performance of the extended exponential index against the well-known distributed tree index. The results show that the exponential index achieves a better performance than the index tree method.

Qing Gao, Shanping Li, Jianliang Xu

Hierarchical Route Optimization in Mobile Network and Performance Evaluation

With a current basic Network Mobility (NEMO) Support, all the communications to and from a node in a mobile network must be able to go through the bi-directional tunnel established between the Mobile Router and its Home Agent when the mobile network is away from the home. One of the issues in designing mobile network with MR-HA bi-directional tunnel is to solve the route optimization problem in the nested mobile networks. Since the aggregated hierarchy of mobile networks becomes a single nested mobile network, in order to forward packets to the nested mobile network nodes, multiple levels of bi-directional nested tunnels are required. We propose a hierarchical mechanism that allows direct packet tunneling between HA and MR and allows localized mobility management for MR.

Keecheon Kim, Dongkeun Lee, Jae Young Ahn, Hyeong Ho Lee

Track 11: Pervasive/Ubiquitous Computing and Intelligence

Swarm Based Sensor Deployment Optimization in Ad Hoc Sensor Networks

In ad hoc sensor networks, sensor nodes have very limited energy resources, thus energy consuming operations such as data collection, transmission and reception must be kept at a minimum. This paper applies particle swarm optimization (PSO) approach to optimize the coverage in ad hoc sensor networks deployment and to reduce cost by clustering method based on a well-known energy model. Sensor nodes are assumed to be mobile, and during the coverage optimization process, they move to form a uniformly distributed topology according to the execution of algorithm at base station. The simulation results show that PSO algorithm has faster convergence rate than genetic algorithm based method while demonstrating good performance.

Wu Xiaoling, Shu Lei, Yang Jie, Xu Hui, Jinsung Cho, Sungyoung Lee

Weighted Localized Clustering: A Coverage-Aware Reader Collision Arbitration Protocol in RFID Networks

This paper addresses a weighted localized scheme and its application to the hierarchical clustering architecture, which results in reduced overlapping areas of clusters. Our previous proposed scheme,

Low-Energy Localized Clustering (LLC)

, dynamically regulates the radius of each cluster for minimizing energy consumption of cluster heads (CHs) while the entire network field is still being covered by each cluster in sensor networks. We present

weighted Low-Energy Localized Clustering(

w

-LLC)

, which has better efficiency than LLC by assigning weight functions to each CH. Drew on the

w

-LLC scheme,

weighted Localized Clustering for RFID networks(

w

-LCR)

addresses a coverage-aware reader collision arbitration protocol as an application.

w

-LCR is a protocol that minimizes collisions by minimizing overlapping areas of clusters.

Joongheon Kim, Wonjun Lee, Jaewon Jung, Jihoon Choi, Eunkyo Kim, Joonmo Kim

A Kind of Context-Aware Approach Based on Fuzzy-Neural for Proactive Service of Pervasive Computing

The task-oriented proactive seamless migration is one of difficult problems to be solved in pervasive computing paradigm. Apparently, this function of seamless mobility is suitable for mobile services, such as mobile Web-based learning. But when seamless migration for computing task of learning is realized among PC, laptop, or PDA, there are several difficult problems to be solved, such as how to supply the proactive/attentive service with uncertainty for aware context. In order to realize E-learning based on proactive seamless migration, we design and improve relative fuzzy-neural approach (of course, besides it, there are other approaches). Generally, the network can be classified into two. One is that fuzzy logic reasoning is completed by fuzzy weight in neural system. The other is that the input data must be fuzzified in the first or second level, but not weight. We discuss and study the second in this paper. For proactive decision, fusion method based on fuzzy-neural can make Web-based learning system keep advantage of fuzzy logic system and remain adaptive optimum in proactive/attentive service. The correctness and validity of our new approach have been tested.

Degan Zhang, Guangping Zeng, Xiaojuan Ban, Yixin Yin

Track 12: Multimedia and Human-Computer Interaction

A Novel Block-Based Motion Estimation Algorithm and VLSI Architecture Based on Cluster Parallelism

This paper proposes a novel block-based motion estimation algorithm and the corresponding architecture based on cluster parallelism. In this algorithm, up to 18 predictors are employed to improve the encoding quality while the computation time is not increased compared with PMVFAST. Experiment results verify the superiority of the proposed algorithm and architecture. The PSNR improvement effect on PMVFAST is 8.14 times higher than that of existing enhanced algorithm EPZS. In particular, they greatly improve the encoding quality of video sequence with large or irregular motion. Designed and synthesized on SMIC 0.18um technology, the architecture works on the frequency of 200MHz. Its throughput is about 15 times higher than the well-known FS architecture with 16 PEs while it consumes only 9.1% memory bandwidth of the FS architecture.

Tie-jun Li, Si-kun Li

Software-Based Video Codec for Mobile Devices

With the rapid development of wireless networks and consumer electronics, various mobile applications have emerged. However, due to some constraints such as weak computational power, limited memory and small display screen, traditional video coding applications can not work well on mobile devices. In this paper, we proposed a software-based video codec framework and its implementation which is suitable for real-time video coding applications on mobile devices. Some key optimizing techniques, such as fast predictive motion estimation (ME), zero-coefficients prejudgment and multiplierless integer discrete cosine transform (DCT), are used in our codec. Experimental results demonstrate the flexibility of our framework and the good speedup we achieved while video quality degradation is negligible. The codec is suitable for scenarios where low-complexity computing is required.

Jiajun Bu, Yuanliang Duan, Chun Chen, Zhi Yang

Real-Time Expression Mapping with Ratio Image

Video Conference under low bandwidth condition, such as conference among hand-held sets (PDAs), requires transmission algorithm to be robust and effective. The real-time issue is always the focus of research. We propose in this paper a fast algorithm in mapping one’s facial expression onto another’s face. The algorithm divides workload into online and offline part and hence speeds up the performance. By using this, it is possible to transmit small data of facial features to end users of conference participants. Expressions can be then synthesized at end user side to produce pseudo-video-sequence of the participants. Thus, real-time transmission can be achieved.

Weili Liu, Cheng Jin, Jiajun Bu, Chun Chen

Power Consumption Analysis of Embedded Multimedia Application

In the past few years, the ubiquity of embedded mobile computing brings about a new challenge for multimedia application. Not only the performance but also the power consumption is vital to the multimedia application because mobile clients are powered by battery. In this paper, we make the detailed power consumption analysis of multimedia applications with two power reduction techniques—ideal clock gating and ideal power gating. Experimental results show the power consumptions can be reduced to 35.22% and 15.68% by ideal clock gating and ideal power gating, respectively. Our primary contributions lie in evaluating the power characteristics of multimedia applications using MediaBench benchmark suite, and evaluating the impact of unideal power reduction techniques on the performance. Such an analysis can help the multimedia applications developers determine the efficient power optimization policy besides the performance optimization. Low-power embedded multimedia applications are promising in the future.

Juan Chen, Yong Dong, Huizhan Yi, Xuejun Yang

Track 13: Network Protocol, Security and Fault-Tolerance

A Dynamic Threshold and Subsection Control TCP Slow-Start Algorithm

DSSC, a Dynamic Slow-start threshold and Subsection Control TCP Slow-start algorithm, is proposed. The key technologies of Vegas and TCP Westwood are applied to the first slow start process in DSSC, which dynamically configures TCP Slow-Start threshold and adaptively adjusts the increasing rate of TCP transmitting windows. DSSC can reach the steady state rapidly because its configuration of slow-start threshold is based on the bandwidth estimation, thus the lost packages will be limited and the entrance of congestion avoidance stage will not be too early. An important phenomenon, the TCP congestion bottleneck buffer response, is discovered, and the reason of this phenomenon is given. The result of simulation proves that this algorithm can avoid data packets loss, get to steady state quickly, and improve TCP throughput on complex network. This algorithm is robust to bottleneck buffer, adapts to WEB service, and is compatible with the present TCP protocol. Finally It is simple and practical in that it only modifies the sender of TCP.

ShiNing Li, JiPing Fang, Zheng Qin, XingShe Zhou

An Improved DRR Packet Scheduling Algorithm Based on Even Service Sequence

With the emerging of many new kinds of network services, it is critical for the network to give service differentiation and QoS guarantee. One of the most important components in existing QoS frameworks is packet scheduler. A good scheduler should provide QoS guarantee and, at the same time, show low complexity. However, existing algorithms often fail to provide the two features at the same time. This paper proposes an improved DRR-like packet scheduling algorithm based on even service sequence, which combining advantages of DRR and WF

2

Q. Our simulation experiments show that this algorithm can provide good fairness, low scheduling delay and low complexity.

Fan Zhang, Shoumeng Yan, XingShe Zhou, Yaping Wang

An Improvement on Strong-Password Authentication Protocols

Password authentication schemes can be divided into two types. One requires the easy-to-remember password, and the other requires the strong password. In 2000, Sandirigama et al. proposed a simple and secure password authentication protocol (SAS). Then, Lin et al. showed that SAS suffers from two weaknesses and proposed an improvement (OSPA) in 2001. However, Chen and Ku pointed out that both SAS and OSPA are vulnerable to the stolen-verifier attack. We also find that these two protocols lack the property of mutual authentication. Hence, we propose an improvement of SAS and OSPA to defend against the stolen-verifier attack and provide mutual authentication in this paper.

Ya-Fen Chang, Chin-Chen Chang

Two-Step Hierarchical Protocols for Establishing Session Keys in Wireless Sensor Networks

Secure communication between sensor nodes is required in most of sensor networks, especially those deployed in a hostile environment. Due to the limited energy and computational capability on each sensor node, a public key cryptosystem is not a viable option for a wireless sensor network. Hence, the idea of key pre-distribution has been widely adopted in most of the session key establishment protocols proposed so far. In this paper, 1) several typical session key establishment protocols are analyzed and compared in terms of common criteria, 2) the requirements for improving upon the existing protocols are derived, and 3) two advanced protocols which take a two-step hierarchical approach to satisfying the requirements are proposed. Through the performance analysis, it has been shown that the proposed protocols improve the connectivity of a sensor network, uniqueness of session keys and security over the existing protocols.

Kyungsan Cho, Soo-Young Lee, JongEun Kim

A Revenue-Aware Bandwidth Allocation Model and Algorithm in IP Networks

This paper proposes a generic revenue-aware bandwidth allocation (RBA) model. This model satisfies diverse QoS requirements of different IP services with revenue as the optimization criterion. Based on this generic model, this paper provides Enhanced Greedy Algorithm (EGA) to solve RBA problems. Unlike other algorithms, the proposed algorithm is deterministic and can be calculated in polynomial time. The experiments on a switched router show that EGA is efficient and applicable for embedded systems.

Meng Ji, Shao-hua Yu

Control Flow Error Checking with ISIS

The Interleaved Signature Instruction Stream (ISIS) is a signature embedding technique that allows signatures to co-exist with the main processor instruction stream with a minimal impact on processor performance, without sacrificing error detection coverage or latency.

While ISIS incorporate some novel error detection mechanisms to assess the integrity of the program executed by the main processor, the limited number of bits available in the signature control word question if the detection mechanisms are effective detecting errors in the program execution flow. Increasing the signature size would negatively impact the memory requirements, so this option has been rejected. The effectiveness of such mechanisms is an issue that must be addressed. This paper details those checking mechanisms included within the ISIS technique that are responsible of the assessment of the integrity of the processor execution flow and the experiments carried out to characterize their coverage.

Francisco Rodríguez, Juan José Serrano

Support Industrial Hard Real-Time Traffic with Switched Ethernet

This paper presents a simple and efficient switched Ethernet communication protocol for industrial hard real-time LAN applications. The network is founded with end nodes and a switch, and hard real-time communication is handled by software added between the Ethernet protocols and the TCP/IP suites. We established a virtual link of the source and destination node by applying admission control based upon the requested QoS, and manages hard real-time traffic to bypass the TCP/IP stacks. This bypassing considerably reduces the dwell time in the nodes, and increases the achievable data frame rate. After the bypassing, hard real-time traffic schedule is executed according to dynamic-priority EDF algorithm.

The protocol does not need any modifications in the Ethernet hardware and coexists with TCP/IP suites, therefore, the LAN with the protocol can be connected to any existing Ethernet networks and support both real-time traffic and best-effort traffic. Compared to some conventional hard real-time network protocols, the proposed one has better real-time performances, and meets the requirements of reliability for hard real-time system applications.

Alimujiang Yiming, Toshio Eisaka

Integer Factorization by a Parallel GNFS Algorithm for Public Key Cryptosystems

RSA is a very popular public key cryptosystem for encryption and authentication. The security of RSA mainly relies on the difficulty of factoring large integers. Recent advancement in factoring algorithms have made it possible to factor integers with 150-digits or more. The General Number Field Sieve algorithm (GNFS) is currently the best known method for factoring large numbers over 110 digits. Although the GNFS algorithm is efficient, it still takes a long time to factor a large integer such as an integer with 150-digits or larger. In this paper, we present a parallel GNFS implementation on a SUN-cluster. It can successfully factor integers up to 116 digits very quickly. The experimental results have demonstrated that the algorithm achieves good speedup and can be used for further larger integer factorization.

Laurence Tianruo Yang, Li Xu, Man Lin

Localized Energy-Aware Broadcast Protocol for Wireless Networks with Directional Antennas

We consider broadcast protocols in wireless networks that have limited energy and computation resources. The well-known algorithm, DBIP (Directional Broadcast Incremental Power), which exploits “Incremental Power” philosophy for wireless networks with directional antenna to construct broadcasting tree, provides very good results in terms of energy savings. Unfortunately, its computation is centralized, as the source node needs to know the entire topology of the network. Mobility of nodes or frequent changes in the node activity status (from “active” to “passive” and vice-versa) may cause global changes in topology which must be propagated throughout the network for any centralized solution. This may results in extreme and un-acceptable communication overhead. In this paper, we propose and evaluate a localized energy-efficient broadcast protocol, Localized Directional Broadcast Incremental Power Protocol (LDBIP), which employs distributed location information and computation to construct broadcast trees. In the proposed method, a source node sets up spanning tree with its local neighborhood position information and includes certain hops relay information in packet. Directional antennas are used for transmitting broadcast packet, and the transmission power is adjusted for each transmission to the minimal necessary for reaching the particular neighbor. Relay nodes will consider relay instructions received to compute their own local neighborhood spanning tree and then rebroadcasts. Experimental results verify that this new protocol shows similar performance with DBIP in static wireless networks, and better performance in mobile scenarios.

Hui Xu, Manwoo Jeon, Lei Shu, Wu Xiaoling, Jinsung Cho, Sungyoung Lee

Track 14: Workshop Selected Papers

The Optimal Profile-Guided Greedy Dynamic Voltage Scaling in Real-Time Applications

Compiler-directed dynamic voltage scaling (DVS) is an effective low-power technique in real-time applications, where compiler inserts voltage scaling points in a real-time application, and supply voltage and clock frequency are adjusted to the relationship between the remaining time and the remaining workload at each voltage scaling point. Greedy dynamic voltage scaling is one of the voltage adjustment schemes, where the slack time of current section is completely used to reduce the clock frequency of next section. In this paper we present the analytical model of the greedy scheme, and by simulations using the analytical model, we find out that the greedy scheme obstructs itself from effectively utilizing the slack times. So we propose a profile-guided greedy voltage adjustment scheme directed by the optimal real-time voltage scheduling in the most frequent execution case. We show by simulations that the new voltage adjustment scheme obtains the largest reduction of energy consumption of all the current representative schemes.

Huizhan Yi, Xuejun Yang, Juan Chen

A Parallelizing Compiler Approach Based on IXA

In this paper, we first analyze the parallel characteristics of both network processor and network application. Our analysis shows that the development using the existed two programming interfaces on IXA is complicated. This suggests that a programming environment which can abstract away architectural details from the developers and can automatically map application to resources is on desire. Thus we introduce a parallelizing compiler developed by Intel called IXP C compiler and analyze its performance on two different mapping forms by compiling packet_processing_pps of ipv4_diffserv-1*10G_Ethernet-egress. Finally we discuss the shortcomings of partition algorithm and also give some suggestions for the future work.

Ting Ding, Naiqi Liu

The Design of Firewall Based on Intel IXP2350 and Autopartitioning Mode C

This paper introduces a design of firewall based on Intel IXP2350 Network Processor. The functions of this firewall include the packet filtering, state inspection, VPN, NAT/PT and etc. The development language used is Auto-partitioning Mode C (IXP-C). Our development process suggests that IXP-C reduces the development time for the firewall project.

Zhang Ke, Liu Naiqi, Chen Yan

AMT6: End-to-End Active Measurement Tool for IPv6 Network

Since IPv6 has more benefits over IPv4, the development and deployment of the IPv6 protocol-based products are currently taking place and the migration of IPv4 to IPv6 has also been steadily happen. In these mixed network environment, Network management for both IPv4 and IPv6 is a serious issue. In this paper, we focus on the design and implementation of an active measurement system which can be used for evaluating end-to-end performance of the IPv6/IPv4 network such as end-to-end available bandwidth, one-way delay, and one-way loss. We also describe the procedure of measurement when using AMT6 and some of its features. The IPv6/IPv4 users as well as network operators will be greatly helped to analyze network characteristics using the proposed architectural framework.

Jahwan Koo, Seongjin Ahn

Semantic Web Based Knowledge Searching System in Mobile Environment

In this paper we propose semantic web-based mobile knowledge searching systems for display Web pages on mobile terminals. The proposed system reduces the time of verification when mobile users search for information to which they want, and ontology-based browsing is possible and the computer can use these resources in mobile environment.

Dae-Keun Si, Yang-Seung Jeon, Jong-Ok Choi, Young-Sik Jeong, Sung-Kook Han

A General-Purpose, Intelligent RAID-Based Object Storage Device

In this paper we will address the architecture and implementation of an object-based storage system, and describe the design of the intelligent object-based storage device (OSD), which is built on embedded system of RAID. Our OSD is designed for general purpose according to the T10 standard. It not only can be easily added to the massive network storage system over TCP/IP to increase the quantity and quality of the storage system by adding intelligent OSD to the storage, but also can reduce the TCO (total cost of ownership).Also we extend the T10 standard by adding the object physical layout information to help shorten the access latency and to increase the intelligence for our OSD.

Fang Wang, Song lv, Dan Feng, Shunda Zhang

The Design and Implement of Remote Mirroring Based on iSCSI

Remote mirroring has become increasingly important as organizations and businesses depending more and more on digital information. Balancing the data availability, access performance and the cost of building and maintaining is essential goal for designing remote mirroring system. This paper presents a novel architecture of remote mirroring based on iSCSI and describes the structure of mirroring log and buffer which can improve performance and keep availability of date synchronously. The experience results show the maximal workload of remote mirroring in 100Mbps and gigabit network environment respectively.

Cao Qiang, Guo Tian-jie, Xie Chang-sheng

Improvement of Space Utilization in NAND Flash Memory Storages

Flash Translation Layer (FTL) is the device driver software that makes flash memory device appear to the system like a disk drive. Since flash memory cannot be written over existing data unless erased in advance, the FTL usually employs special address mapping algorithms to avoid having to erase on every data update. In this paper, we propose a new FTL algorithm which considers the access patterns of data blocks. Proposed scheme monitors write access patterns of data blocks and intelligently manages the address mapping to improve the performance. Simulation results show that the proposed scheme improves the space utilization without significant write/update performance degradation.

Yeonseung Ryu, Kangsun Lee

Keynote Speech

Smart u-Things and Ubiquitous Intelligence

Smart u-things are real things with attached, embedded or blended computers, networks, and/or some other devices such as sensors, actors, e-tags and so on, and they can sense, compute, communicate and take some adaptive actions/reactions/proactions according to their goals, situated contexts, users’ needs, etc. It is envisioned that smart u-thing will be everywhere eventually towards ubiquitous intelligence and smart world. One of the profound implications of such ubiquitous smart u-things is that various kinds and levels of intelligence will exist pervasively in real everyday objects, environments, systems and even ourselves, and possibly be extended from man-made to natural things. The ubicomp/percomp can be regarded, in a sense, as the computing of all these smart/intelligent u-things, which are the basic elements and components of the smart world. After clarifying the essential features and three categories of smart u-things, i.e., smart object, smart space and smart system, the talk is devoted to discuss possible challenges in smart u-things’ research in terms of real world complexity. The main intentions are to examine the possible hard issues to suggest some potential research lines; and to let researchers in this field coolhead and being aware of the hardness of these challenges in making real things truly smart.

Jianhua Ma

Springer Professional

About this book

Table of Contents

Frontmatter

Keynote Speech

Are Lessons Learnt in Mobile Ad Hoc Networks Useful for Wireless Sensor Networks?

Compiler-Directed Scratchpad Memory Management

Heterogeneous Multi-processor SoC: An Emerging Paradigm of Embedded System Design and Its Challenges

Track 1: Embedded Hardware

Trace-Based Runtime Instruction Rescheduling for Architecture Extension

Bioinformatics on Embedded Systems: A Case Study of Computational Biology Applications on VLIW Architecture

The Design Space of CMP vs. SMT for High Performance Embedded Processor

Reconfigurable Microarchitecture Based System-Level Dynamic Power Management SoC Platform

Track 2: Embedded Software

A Methodology for Software Synthesis of Embedded Real-Time Systems Based on TPN and LSC

Ahead of Time Deployment in ROM of a Java-OS

The Research on How to Reduce the Number of EEPROM Writing to Improve Speed of Java Card

A Packet Property-Based Task Scheduling Policy for Control Plane OS in NP-Based Applications

RBLS: A Role Based Context Storage Scheme for Sensornet

CDP: Component Development Platform for Communication Protocols

TrieC: A High-Speed IPv6 Lookup with Fast Updates Using Network Processor

Separate Compilation for Synchronous Modules

Implementation of Hardware and Embedded Software for Stream Gateway Interface Supporting Media Stream Transmissions with Heterogeneous Home Networks

Track 3: Real-Time Systems

On Using Locking Caches in Embedded Real-Time Systems

Trace Acquirement from Real-Time Systems Based on WCET Analysis

Elimination of Non-deterministic Delays in a Real-Time Database System

Solving Real-Time Scheduling Problems with Model-Checking

Efficient FPGA Implementation of a Knowledge-Based Automatic Speech Classifier

Track 4: Power-Aware Computing

A Topology Control Method for Multi-path Wireless Sensor Networks

Dynamic Threshold Scheme Used in Directed Diffusion

Compiler-Directed Energy-Aware Prefetching Optimization for Embedded Applications

A Dynamic Energy Conservation Scheme for Clusters in Computing Centers

Track 5: Hardware/Software Co-design and System-On-Chip

Realization of Video Object Plane Decoder on On-Chip Network Architecture

Network on Chip for Parallel DSP Architectures

A New Methodology of Integrating High Level Synthesis and Floorplan for SoC Design

Designing On-Chip Network Based on Optimal Latency Criteria

Track 6: Testing and Verification

Microprocessor Based Self Schedule and Parallel BIST for System-On-a-Chip

Self-correction of FPGA-Based Control Units

Detecting Memory Access Errors with Flow-Sensitive Conditional Range Analysis

Deductive Probabilistic Verification Methods of Safety, Liveness and Nonzenoness for Distributed Real-Time Systems

Specification and Verification Techniques of Embedded Systems Using Probabilistic Linear Hybrid Automata

Formalization of fFSM Model and Its Verification

Track 7: Reconfigurable Computing

Dynamic Co-allocation of Level One Caches

Jaguar: A Compiler Infrastructure for Java Reconfigurable Computing

CCD Camera-Based Range Sensing with FPGA for Real-Time Processing

Track 8: Agent and Distributed Computing

Best Web Service Selection Based on the Decision Making Between QoS Criteria of Service

Data Storage in Sensor Networks for Multi-dimensional Range Queries

An OSEK COM Compliant Communication Model for Smart Vehicle Environment

Track 9: Wireless Communications

Resource Allocation Based on Traffic Load over Relayed Wireless Access Networks

An Adaptive Cross Layer Unequal Protection Method for Video Transmission over Wireless Communication Channels

Power-Efficient Packet Scheduling Method for IEEE 802.15.3 WPAN

Two Energy-Efficient, Timesaving Improvement Mechanisms of Network Reprogramming in Wireless Sensor Network

On Location-Free Node Scheduling Scheme for Random Wireless Sensor Networks

Leading Causes of TCP Performance Degradation over Wireless Links

The Study and Implementation of Wireless Network Router NPU-1

Track 10: Mobile Computing

Performance Evaluation of Air Indexing Schemes for Multi-attribute Data Broadcast

Hierarchical Route Optimization in Mobile Network and Performance Evaluation

Track 11: Pervasive/Ubiquitous Computing and Intelligence

Swarm Based Sensor Deployment Optimization in Ad Hoc Sensor Networks

Weighted Localized Clustering: A Coverage-Aware Reader Collision Arbitration Protocol in RFID Networks

A Kind of Context-Aware Approach Based on Fuzzy-Neural for Proactive Service of Pervasive Computing

Track 12: Multimedia and Human-Computer Interaction

A Novel Block-Based Motion Estimation Algorithm and VLSI Architecture Based on Cluster Parallelism

Software-Based Video Codec for Mobile Devices

Real-Time Expression Mapping with Ratio Image

Power Consumption Analysis of Embedded Multimedia Application

Track 13: Network Protocol, Security and Fault-Tolerance

A Dynamic Threshold and Subsection Control TCP Slow-Start Algorithm

An Improved DRR Packet Scheduling Algorithm Based on Even Service Sequence

An Improvement on Strong-Password Authentication Protocols

Two-Step Hierarchical Protocols for Establishing Session Keys in Wireless Sensor Networks

A Revenue-Aware Bandwidth Allocation Model and Algorithm in IP Networks