Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 23rd CCF Conference on Computer Engineering and Technology, NCCET 2019, held in Enshi, China, in August 2019.

The 21 full papers presented were carefully reviewed and selected from 87 submissions. They address important and emerging challenges in the field of computer engineering and technology.



Confidence Value: A Novel Evaluation Index of Side-Channel Attack

The side-channel attacks (SCAs) use the correlation between the power leakage information and the key to implement the attack process. The result of SCAs has a certain probability. If guessing an 8-bit key, there is a probability of 1/256 that the key will be guessed coincidentally, resulting in false positive. Therefore, the reliability of result key also needs an index to measure. Thereby, this paper proposes a novel evaluation index based on confidence value (CV). The CV of result key is divided three levels, low false positive, medium false positive and high false positive. CV provides a new reference index for the designers, suppliers and users of cryptographic devices to evaluate the security of devices.
Xiaomin Cai, Shijie Kuang, Gao Shen, Renfa Li, Shaoqing Li, Xing Hu

Design of High Precision Band-Pass Sigma-Delta ADC in MEMS Gyroscope

In order to meet the demands of MEMS gyroscope for high precision and narrow bandwidth of ADC, a band-pass Sigma-Delta ADC with high precision is designed and verified by simulation. The oversampling rate (OSR) and center frequency are obtained through detecting and calculating the data of the gyroscope meter. As a result, The ADC structure of single-loop, six-order, and one-bit quantization is determined, and a suitable resonator is selected. The simulation results of the Simulink model of the ADC circuit show that the signal-to-noise ratio of the ADC can reach 132.2 dB and the effective number of bits can reach 21.70 bits, which satisfies the requirements of high precision and narrow bandwidth of the MEMS gyroscope. Thus, my design can help guide the design of transistor-level circuit for the band-pass Sigma-Delta ADC.
Lin Xiao, Jinhui Tan, Shaoqing Li, Jihua Chen

An Efficient Rule Processing Architecture Based on Reconfigurable Hardware

In the recovery of security strings in identity authentication mechanism, the combination of dictionary and string transformation rules is an effective method. However, the processing of string transformation rules faces challenges such as performance and energy efficiency. The existing tools and researches are based on software. It’s difficult to meet the needs of actual recovery systems. In this paper, an efficient rule processing architecture based on reconfigurable hardware is proposed. The rules are processed using FPGA for the first time. A rule processor is designed and implemented on Xilinx Zynq XC7Z030 chip. The experimental results show that the performance of the rule processor is better than that of Intel i7-6700 CPU in typical cases. The performance power ratio of the rule processor is 1.4–2.1 times higher than that of NVIDIA GeForce GTX 1080 Ti GPU and 70 times higher than that of CPU, which effectively improves the speed and efficiency of rule processing.
Mengdong Chen, Xiujiang Ren, Xianghui Xie

Performance Analysis of Existing SIMD Architectures

SIMD (Single Instruction Multiple Data) architectures are widely used in application domains like the wireless communication, video and audio processing, and control engineering. The abundant data parallelism makes the SIMD architecture the proper match in data processing and performance improvement. However, there are also critical inefficiencies in current SIMD architectures. To understand such inefficiency, we carry out a deep investigation in the main components of Long Term Evolution (LTE) protocol, which is an important wireless communication protocol. Performance investigation is taken on a cycle-accurate simulator, featuring the main characteristics of existing SIMD architectures. Based on the investigation, we locate the inefficiencies in two aspects: the data communication operations among different processing units and the support for matrix-style computations. We have also carried out studies with enhanced SIMD architectures in the above two aspects. The overall performance of SIMD architectures can be greatly improved.
Chao Cui, Xian Zhang, Zhicheng Jin

A Coherent and Power-Efficient Optical Memory Access Network for Kilo-Core Processor

Coherent and power-efficient processor-memory interconnects are of great importance for kilo-core processor design. This paper proposes a hybrid photonic architecture for such interconnection. Specifically, a bandwidth-efficient photonic network which also supports coherence management is used for memory accesses between last-level HBM caches and off-chip HMC memory pools. Simulation results show that the hybrid network achieves up to 11% of system speedup and up to 6 times of energy savings, when compared to conventional electric interconnects.
Quanyou Feng, Junhui Wang, Hongwei Zhou, Wenhua Dou

CoEM: A Software and Hardware Co-design Event Management System for Middlebox

Stateful middleboxes play a very important role in the security and performance of the network. However, they mostly exist as separate devices in network and distributed in different topological nodes. By analyzing the packet processing of these middleboxes, we find that they have many common functions, such as the management of the flow states, the parsing of the packet protocol. The redundant development of these functions not only causes great waste of human and material resources, but also involves relevant expertise, which is extremely error-prone.
To address these issues, we introduce CoEM, a hardware and software co-design event management system for the middlebox. In CoEM, we implement flow classification and flow state management, and we also generate basic events in the protocol parsing process. Basic events generate user-defined events through event generators. Different middleboxes can be implemented by defining these event handling methods. Since multiple middleboxes define event handling methods separately, we set priority to ensure that packets are passed through the right middlebox order. We use the event management system to achieve a stateful firewall. Performance testing shows that the packet processing speed has been improved.
Jianguo Gou, Wenwen Li, Jie Qiu, Hu Lv, Teng Ma

A Battery SOC Prediction Method Based on GA-CNN Network and Its Implementation on FPGA

Battery SOC is affected by many uncertain factors, so it is difficult to predict the exact value. In view of this situation, a convolution neural network prediction method optimized by genetic algorithm is proposed. Taking voltage_measured, current_measured, temperature_measured, current_load and voltage_load as input vectors of the neural network, genetic algorithm is used to generate the initial weights of neural network, and the GA-CNN battery SOC prediction model is constructed. The software and hardware GA-CNN neural network is realized by C language and FPGA programming respectively. The software implementation verifies the correctness of the algorithm, and the hardware implementation achieves the effect of real-time monitoring. The experiment results of C language show that the battery SOC prediction results based on GA-CNN neural network are more accurate. The hardware simulation results are consistent with the software results.
Wenzhen Guo, Jinwen Li

The Implementation of a Configurable MBIST Controller for Multi-core SoC

Aiming at the problem of memory test power caused by the increasing proportion of embedded memory in multi-core SoC, this paper analyzes the existing issue and proposes a configurable MBIST controller to reduce test power consumption. This paper adopts MBIST configuration scan-chain to organize test groups and adopts a configurable PLL scan-chain to drive memories to its working frequency. Clock optimization method is also adopted to reduce test power. The method proposed has the advantages of low test power, flexible test configuration and less hardware added. The method can also diagnose the site of failing memories. The actual testing of the multi-core SoC on ATE V93000 shows that the proposed method effectively reduces power consumption, and meets the requirement of memory test.
Chunmei Hu, Xiaoxuan Li, Zhigang Fu, Qianqian Tang, Rong Zhao

Structure Design of a Fully Enclosed Airborne Reinforcement Computer

The Structure design method of a fully enclosed airborne reinforcement computer is discussed and the design scheme is introduced. The chassis structure, thermal design, electromagnetic compatibility and anti-vibration design of the case are emphatically introduced, and corresponding technical solutions and main structural diagrams are given for key technical problems. The fully enclosed computer mentioned in this paper has been applied in engineering and has well anti harsh environmental performance.
Jiang Feng Huang

Design Discussion and Performance Research of the Third-Level Cache in a Multi-socket, Multi-core Microchip

L3cache is an essential part of microchips, which is integrated into most of the microchips such as Intel and AMD chips. FeiTeng serial microchips is an independent research and designed microchip. Our research is based on a 64-cores multi-socket FeiTeng chip. To increase the performance of this chip, L3cache is designed for this chip. This paper first discusses the design of L3cache. Then two crucial evaluation indexes, the latency and bandwidth, are researched. From the simulation, it can be found that when opening L3cache, the latency can reduce 10% at most compared with the latency when closing L3cahce. Moreover, when opening L3cache, the bandwidth can increase twice under the circumstance of accessing a small amount of data. Considering the analysis, it can be concluded that for a multi-socket, multi-core system, L3cache can largely improve the systemic performance.
Nan Li, Rangyu Deng, Ying Zhang, Hongwei Zhou

An Efficient and Reliable Retransmission Mechanism for On-Chip Network of Many-Core Processor

Building a reliable and efficient Network-on-chip(NOC) system has always been an important part of the research on many-core processor architecture. In this paper, we propose a retransmission mechanism for many-core processor using dynamic pipeline and static flow control, which can break the deadlock caused by sharing channel on 2D mesh NOC. The configuration of key parameters in this retransmission mechanism is given by modeling. The modeling analysis and actual test results show that, the retransmission mechanism can not only avoid congestion and deadlock in NOC, but also effectively satisfies network bandwidth and memory access performance by properly set the depth of retransmission sender queue and retransmission receiver queue in different address mapping modes and transmission delays.
Jianmei Luo, Hongwei Zhou, Ying Zhang, Nan Li, Ying Wang

Anti-vibration Performance and Electromagnetic Compatibility Design for the Shipborne Reinforced Computer

The ability of the shipborne computer to withstand harsh environments plays a very important role in ensuring the stability of the warship’s system. This paper discusses design ideas of shipborne computer deeply, by using ANSYS to analysis the vibration of equipment modal, and elaborating on the electromagnetic compatibility design of the reinforcement computer. Through computer simulation and experiment, it is shown that the design of this type of shipborne reinforcement computer can guarantee its resistance to vibration shock and electromagnetic compatibility, and make sure that it has good comprehensive protection performance.
Guangle Qin

Effect of Passivating and Metallization Layers on Low Energy Proton Induced Single-Event Upset

Using Monte Carlo and TCAD simulation, we investigate the effect of passivating and metallization layers on low energy proton induced SEU in the commercial SRAM cell. Simulation results indicate metallization layers and tungsten contacts significantly reduce proton energy and enhance the energy distribution. Therefore, they can decrease the SEU percentage of the commercial SRAM cell.
Ruiqiang Song, Jinjin Shao, Bin Liang, Yaqing Chi, Jianjun Chen

Design and Realization of Integrated Service Access Gateway (ISAG) for Integrated Fusion Shipboard Network

Shipboard network is a bearing platform for the ship’s information processing system, and it is composed of control network, service related network, public computing platform network, and communication network. Integrated fusion of multiple networks is a trend for Shipboard network, however there are some challenges: how to ensure uniform business bearer, how to guarantee service quality, and how to realize layered safety protection and availability. In this paper, features and design requirements of the integrated fusion Shipboard network are analyzed first, and then an integrated network architecture based on ISAG and a model to realize the ISAG are put forward. Works concerned with GW80, a ISAG oriented to design of Shipboard network integrated fusion are described at last.
Qilin Wang

A Convolutional Neural Networks Accelerator Based on Parallel Memory

Convolutional Neural Networks (CNNs) is one of the core algorithms for implementing artificial intelligence (AI), which has the characteristics of high parallelism and large amount of computations. With the rapid development of AI applications, general purpose processors such as CPU/GPU can’t meet the requirements for performance, power consumption and real-time performance of CNN. However, ASIC can fully exploit the parallelism of CNN and improve resource utilization to meet its requirements This paper has designed and implemented a new CNN accelerator based on parallel memory technology, which can support multiple parallelisms. A super processing unit (SPU) with kernel buffer and output buffer is proposed to make computation and data fetching more streamline then ensure the performance of accelerator. In addition, a two-dimensional buffer which can provide conflict-free non-aligned block access with different steps and aligned continuous access to meet the data requirements of varies parallelisms. The synthesis results show it can work at 1 GHz frequency with area overhead of 4.51 mm2 and on-chip buffer cost of 192 KB. We evaluated our design with varies CNN workloads, the efficiency of our design over 90% in most cases. Compared with the state-of-the-art accelerator architectures, the hardware cost of our design is smaller under the same performance.
Hongbing Tan, Sheng Liu, Haiyan Chen, Honghui Sun, Hongliang Li

RCTS: Random Cyclic Testing System

In order to fully verify the correctness of the design, random test generators are usually used in microprocessor verification. In this paper, a random cyclic testing system called RCTS is designed for multi-core and heterogeneous many-core processors. RCTS supports FPGA verification and can verify hardware logic and integrated implementation at design stage. RCTS also supports the verification of prototype chips. Especially in the system-level hardware-software co-verification, it can find hard-to-expose hardware design problems. RCTS can completely eliminate the need for architectural simulators to compare the results, so it can reduce the verification time and improve the verification efficiency.
Wang Liyi, Wang Xingyan, Zheng Yan, Shen Li, Tan Jian

Evaluation and Optimization of Interrupt Response Mechanism in RISC-V Architecture

RISC-V (Reduced Instruction Set Computer-Five) is an emerging universal open ISA, targeting to become as popular for processors as Linux for operating systems. Currently, many research institutions and companies publish various RISC-V processor cores. One of the most important feature of processors is the ability to response to interrupt events. This paper studies the interrupt mechanism of Hummingbird e203, which is an open-source RISC-V processor. By analyzing the existing interrupt mechanism, we propose a new mechanism of interrupt vectorization, which can achieve faster interrupt response. We also carry out simulation and logical synthesising for these two different response mechanism. Theoretical analyzing and evaluation results show that our design is feasible and efficient, improving the response speed to 1.6x–3.5x.
Kefan Xu, Yong Li, Bo Yuan, Dongchu Su

Numerical Analysis and Experimental Study on Heat Dissipation Performance of Sealed Rugged Server

With the rapid development of computer technology, the power density of the rugged server is getting larger and larger, so it must be reasonably designed to ensure that the temperature of the server is properly controlled within the scope. This paper introduces the structure of the rugged server and selects the appropriate fan through theoretical analysis and calculation. Based on Icepak thermal simulation software, the rugged server is simulated and optimized, getting the steady temperature field and the velocity field inside the server. The product prototype is tested at a high temperature of 55 °C. The test results show the scheme of thermal design is feasible. The working temperature meets the derating design requirements, and the reliability of the equipment is improved. The simulation results of the paper can provide calculation methods and reference for the thermal design of sealed rugged servers.
Miao Zhang, Fuge Wang

Design of High Performance Server for Shipboard Common Computing Applications

Analyze the advantages of the Shipboard “Common computing” application model, and analyze the server design requirements and design ideas for “common computing” in combination with the vessel application environment. On this basis, introduce the design of the server structure, hardware topology, computing blade and cloud operating system. Finally, performance testing is performed. The physical performance of the server compute blade and the virtual machine performance of the server are tested and compared with the current shipboard computer. The results show that the server has outstanding computing performance, and the performance of virtual machine is better than current shipboard computer. In terms of real-time performance, the virtual machine is equivalent to the current shipboard computer. The virtual machine can realize the function of the physical machine on ships.
Peng Zhang

Automated Deadlock Verification for On-Chip Cache Coherence and Interconnects Through Extended Channel Dependency Graph

Cache coherence and On-chip interconnections are of great importance in many-core system. The verification of deadlock freedom is challenging, since modern coherence protocol and communication fabrics are becoming increasingly complex. Formal methods play an important role in the verification of deadlock, which need extraordinary work of modeling and long computation time. Thus, formal methods cannot model the system in a fine-grain way, leading to the failure of discovering deadlocks introduce my certain implementation details, such as two types of messages sharing a common FIFO, two channels sharing a credit counter, etc. This paper proposes a simple but efficient automated methodology for deadlock verification through the extended channel dependency graph, which extends the channel dependency graph to consider not only the interconnection node, but also the coherence processing node. The methodology allows fast and cross-layer verification of both the protocol, the network and the implementation all at once. The methodology is applied in a case study where eight 64-core chips co-operate with multiple direct inter-chip links. It is proved to be generally applicable and shows promising salability.
Kun Zeng


Weitere Informationen