Skip to main content
main-content

Über dieses Buch

This book constitutes the refereed proceedings of the 19th CCF Conference on Computer Engineering and Technology, NCCET 2015, held in Hefei, China, in October 2015. The 18 papers presented were carefully reviewed and selected from 158 submissions. They are organized in topical sections on processor architecture; application specific processors; computer application and software optimization; technology on the horizon.

Inhaltsverzeichnis

Frontmatter

Processor Architecture

Frontmatter

Modeling and Analyzing of 3D DRAM as L3 Cache Based on DRAMSim2

Cache memory system with a die-stacking DRAM L3 cache is a promising answer to break the Memory Wall and has a positive effect on performance. In order to further optimize the existing memory system, in this paper, a 3D DRAM as L3 Cache is modeled and analyzed based on DRAMSim2 simulator. In order to use an on-die DRAM as cache, tags and data are combined in one row in the DRAM, meanwhile, utilize the 3D DRAM with wider bus width and denser capacity. The cache memory modeling platform is evaluated by running traces which simulate the access behavior of core from spec2000 that generated by gem5. With DRAM L3 cache, all the test traces experience an improvement of performance. Read operation has an average speed-up of 1.82× over the baseline memory system, while write operation is 6.38×. The improvement of throughput in 3D DRAM cache compared to baseline system can reach to 1.45×’s speedup.

Litiao Qiu, Lei Wang, Qiang Dou, Zhenyu Zhao

Partitioning Methods for Multicast in Bufferless 3D Network on Chip

In this paper, we proposed two region partition multicast routing algorithms for the 3D mesh Interconnection Network to enhance the overall system performance. The proposed two algorithms shorten the network long path latency. Compared to the based multicast routing algorithm, our simulations with six different synthetic workloads reveal that our architecture acquires high system performance.

Chaoyun Yao, Chaochao Feng, Minxuan Zhang, Wei Guo, Shouzhong Zhu, Shaojun Wei

Thermal-Aware Floorplanner for Multi-core 3D ICs with Interlayer Cooling

Internal thermal problem has become a critical challenge in multi-core 3D ICs. The interlayer cooling system provided a new solution for this problem, and expanded the design space of multi-core microprocessor floorplan. This work proposes a thermal-aware floorplanner for multi-core 3D ICs with interlayer cooling, with iterative algorithm based on simulated annealing method. The results show that the maximal temperature is reduced by 15$$^{\circ }$$∘C, and the temperature gradient is reduced by 28.4$$^{\circ }$$∘C compared to the baseline design with 3 active device layers.

Wei Guo, Minxuan Zhang, Peng Li, Chaoyun Yao, Hongwei Zhou

The Improvement of March C+ Algorithm for Embedded Memory Test

March C+ is commonly used as a memory test algorithm. The basic principle is to use finite state machines to read and write all the addresses one by one. This paper analysis the sensitivity conditions of several fault types not covered by the March C+ algorithm, and derived a new 22 N algorithm, March Y, which increase the fault coverage of WDF, CFdsxwx and CFwd. March Y has the same symmetry as the March C+ algorithm, and achieves the coverage of all of the single unit fault types and coupling faults.

Yongwen Wang, Qianbing Zheng, Yin Yuan

Mitigating Soft Error Rate Through Selective Replication in Hybrid Architecture

With the rapid development of integrated circuit technology, soft error has increasingly become the major factor for the reliability of microprocessors. The researchers employ a variety of methods to reduce the influence of soft errors. Besides the lower delay and increasing bandwidth, 3D integration technology also has the ability of heterogeneous integration. STT-RAM is a new storage technology with broad prospects. The characteristic that STT-RAM is immune to soft errors makes it ideal candidate for improving reliability and STT-RAM can be integrated into the 3D chip through heterogeneous integration. In this paper, we proposed a selective replication mechanism for soft error rate reduction in hybrid reorder buffer architecture based on the 3D integration technology and STT-RAM. Instructions will be replicated or migrated to STT-RAM for reliability improvement in certain situations. The experimental results show that the soft error rate of the proposed hybrid structure is reduced by 15 % on average and the AVF decreased 54.3 % further on average through the in-buffer selective replication mechanism while the performance penalty is 2.8 %.

Chao Song, Minxuan Zhang

Application Specific Processors

Frontmatter

A New Memory Address Transformation for Continuous-Flow FFT Processors with SIMD Extension

The property of addresses accessed by one butterfly in FFT processors arises the difficulty for parallel accessing during computation. And the address reversal at input or output stage increases the difficulty for parallel I/O. In this paper, a new and simple generalized memory address transformation method supporting parallel accessing for computation is proposed to accelerate 2n-point Mixed-Radix FFT for memory-based FFT processors with SIMD extension. To make I/O clock cycles match up with computation cycles, a new I/O addresses parallel generation method is also proposed. The advantages of the method proposed in this paper lie in the fact that they support the maximum throughput SIMD memory with multi-bank structures and in-place policy for both I/O and computation with continuous data flow. And most importantly, the address transformation circuit for FFT computations is low-complexity with only XOR gates; the I/O addresses parallel generation circuit is also simple with just counters.

Chao Yang, Haiyan Chen, Sheng Liu, Sheng Ma

Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs

In this paper, we proposed a parallel algorithm to implement the sparse matrix transposition using ELLPACK-R format on the graphic processing units. By utilizing the tremendous memory bandwidth and the texture memory, the performance of this algorithm can be efficiently improved. Experimental results show that the performance of the proposed algorithm can be improved up to 8x times on Nvidia Tesla C2070, compared with the implementation on the Intel Xeon E5-2650 CPU. It also can be concluded that it is not wise to accelerate the transposition algorithm for the matrices in the ELLPACK-R format with violent divergence in the number of nonzero elements among the rows.

Song Guo, Yong Dou, Yuanwu Lei, Qiang Wang, Fei Xia, Jianning Chen

Channel Estimation in Massive MIMO: Algorithm and Hardware

Currently 5G is research hotspot in communication field, and one of the most promising wireless transmission technologies for 5G is massive multiple input multiple output (MIMO) which provides high data rate and energy efficiency. The main challenge of massive MIMO is the channel estimation due to the complexity and pilot contamination. Some improvement of traditional channel estimation methods to solve the problem in massive MIMO have been introduced in this paper. Besides, the hardware acceleration is useful for massive MIMO channel estimation algorithm. We discuss the relate work about hardware accelerator of matrix inversion and singular value decomposition which are the main complex operations of channel estimation. We find that the memory system, network of processing elements and the precision will be the main research directions for the hardware design of large-scale data size.

Chuan Tang, Cang Liu, Luechao Yuan, Zuocheng Xing

A ML-Based High-Accuracy Estimation of Sampling and Carrier Frequency Offsets for OFDM Systems

This paper addresses the problem of acquiring the sampling frequency offset (SFO) and carrier frequency offset (CFO), which severely degrade the performance of orthogonal frequency division multiplexing (OFDM) system. Using two identical frequency domain (FD) long training symbols in preamble, we propose a novel maximum-likelihood (ML) estimation method to simultaneously acquire the values of SFO and CFO, which extend the Kim’s and Wang’s estimation methods. The main contribution of this paper is that the first-order Legendre series expansion is used to obtain the SFO and CFO values in closed-form. For obtaining the performance of the proposed estimation scheme, we built the OFDM system model according to IEEE 802.11a. The results show that the proposed scheme achieves the best performance to the existing schemes.

Cang Liu, Luechao Yuan, Zuocheng Xing, Xiantuo Tang, Guitao Fu

A High-PSRR CMOS Bandgap Reference Circuit

The paper presents a high power supply rejection ratio (PSRR) CMOS bandgap reference (BGR). The circuit adopts a pre-regulator. To facilitate comparison, BGRs with- and without- pre-regulator are, respectively, designed and simulated in the 0.13 μm standard CMOS process technology. Simulation results show that the PSRR of the designed BGR with pre-regulator achieves, respectively, −107.3 dB, −106.6 dB and−75 dB at 100 Hz, 1 kHz and 100 kHz, while PSRR of BGR without pre-regulator has only, respectively, −70.6 dB, −70.5 dB and −65 dB at 100 Hz, 1 kHz and 100 kHz. The BGR with pre-regulator achieves a bandgap voltage reference of 0.76 V, a temperature coefficient of 0.55 ppm/°C in the temperature range from −25 °C to 125 °C, and a deviation of output voltage of 0.08 mV when the power supply voltage changed from 2.6 V to 6.2 V.

Chang Liping, An Kang, Liu Yao, Liang Bin, Li Jinwen

Computer Application and Software Optimization

Frontmatter

Detection and Analysis of Water Army Groups on Virtual Community

Water army is prevalent in social networks and it causes harmful effect to the public opinion and security of cyberspace. This paper proposes a novel water army groups detection method which consists of 4 steps. Firstly, we break the virtual community into a series of time windows and find the suspicious periods when water army groups are active. Then we build the user cooperative networks of suspicious periods according to user’s reply behaviors and cluster them based on their Cosine similarity. After that, we prune the cooperative networks by just remaining the edges whose weight is larger than some threshold and get some suspicious user clusters. Finally, we conduct deeper analysis to the behaviors of the cluster users to determine whether they are water army groups or not. The experiment results show that our method can identify water army groups on virtual community efficiently and it has a high accuracy.

Guirong Chen, Wandong Cai, Jiuming Huang, Huijie Xu, Rong Wang, Hua Jiang, Fengqin Zhang

Accelerating Molecular Dynamics Simulations on Heterogeneous Architecture

Molecular dynamics (MD) is an important computational tool used to simulate chemical and physical processes at the molecular level. MD simulations focus on the motion of the interaction of numerous molecules or atoms. Most scholars focus on accelerating MD on multicore central processing units (CPUs) or other coprocessors, such as graphics processing unit (GPU) or many integrated cores [1]. However, most researchers disregard CPU resources and merely perceive a CPU as a controller when using coprocessors. Thus, hybrid computing cannot be achieved, thereby resulting in the waste of CPU computing resources. In this study, we propose three strategies to accelerate MD simulation. The first strategy uses Compute Unified Device Architecture [2] to rewrite the MD code and to run applications on a single-core CPU-GPU platform. This strategy can achieve satisfactory performance but does not make use of CPU resources to compute for most research activities. In the second strategy, the CPU is set to compute the pair force of a small part of molecules along with the GPU after accomplishing the task of starting the GPU computation. The third strategy is applicable under the condition that the GPU is shared by numerous MPI processes, each of which uses the GPU separately. In this situation, the performance can be improved.

Yueqing Wang, Yong Dou, Song Guo, Yuanwu Lei, Baofeng Li, Qiang Wang

A Cloud Server Based on I/O Virtualization in Hardware

With the advent of Internet services and big data, cloud computing has generated much research interest, especially on cloud servers. In view of the development of lightweight server processors, i.e., x86 single-chip processors and ARM64 processors, and the high-performance interconnect fabric, an approach building a cloud server on top of virtualized I/O is presented in this paper. Its advantage is to provide high performance/cost, performance/Watt, high-density and high scalability compared with the existing method, to better meet the demands of cloud computing.

Yang You, Gongbo Li, Xiaojun Yang, Bowen Qi, Bingzhang Wang

The Evolution of Supercomputer Architecture: A Historical Perspective

Approaches to supercomputer architecture have taken dramatic turns since the earliest supercomputer systems were introduced in the 1960s. Massively Parallel Processor keeps losing its rank in the fastest computer list. Cluster’s rank and share in the TOP500 list has been steadily rising at a tremendous speed. Perspectives are given on how supercomputers have evolved thru time. The architectures are presented in chronological order. And finally, the trend of current supercomputer architecture design is analyzed.

Bao Li, Pingjing Lu

Technology on the Horizon

Frontmatter

Simulation of Six DOF Vibration Isolator System for Electronic Equipment

Electronic equipment system is always manufactured as a super precision system. However, it will be used in harsh environment. For example, the computer in moving carriers will acted by vibrations. The objective of this paper is to provide a systematic investigation to computer-aided design of the vibration isolator for protection of electronic equipment in harsh vibration environment. This papers deal with fast solving method of natural frequency and system response of six DOF (Degree of Freedom) vibration isolator system. In the foundation of a mathematical model of vibration motion differential equation, the state space method is derived and presented. Through transforming the vibration isolation differential equations into the state space equations, it is convenient to facilitate the solution of vibration isolation coefficient of vibration isolation system of six DOF of freedom, by using the state space method and the MATLAB/Simulink model. Comparisons with reality data, Simulation results showed the result is consistent with the reality result. The state space method can find further applications on the selection of vibration isolation system and the evaluation of vibration isolation efficiency.

Yufeng Luo, Jinwen Li, Yuanshan Li, Xu Chen

Impact of Heavy Ion Species and Energy on SEE Characteristics of Three-Dimensional Integrated Circuit

Via Geant4 simulations, SEEs are characterized for each die of 3DIC with different heavy ion species and energy in this paper. It is found that the incident ions with high atomic number make the SEE more serious for each die and there are obvious differences on SEE characteristics between each die after the low energy heavy ions striking 3DIC. Our research also indicates that SEE sensitivity of inner dies is no less than that of the outer ones unless the heavy ions stop above the inner dies. It is because the secondary particles induced by nuclear reaction and the scattered heavy ions caused by low energy incident can trigger severe multi-SEEs. It concludes that the inner dies of 3DIC also need to be hardened, and the technologies restraining severe multi-SEEs should be taken for them, if the higher reliability is required.

Peng Li, Wei Guo, Zhenyu Zhao, Minxuan Zhang

Analysis and Simulation of Temperature Characteristic of Sensitivity for SOI Lateral PIN Photodiode Gated by Transparent Electrode

This paper performs the structure and principle of SOI Lateral PIN photodiode Gated by Transparent Electrode. The temperature models of photocurrent and dark current are presented and validated by 2D ATLAS simulation. The variation of temperature on sensitivity is addressed when the LPIN PD-GTE is fully depleted. In contrast, the same work is presented on SOI Lateral PIN photo diode. The simulated results indicate the internal quantum efficiency of SOI LPIN PD-GTE remains about (95 %) with illumination of 400 nm wavelength as the temperature rises while the signal-noise-ratio decreases. SNR achieves $$10^7$$107 at 300 K and decreases to $$10^3$$103 at 473 K. FHWM is almost unchanged varing the temperatures. Thus, the sensitivity decreases when the temperature rises. Still, considering the fact that the operating temperature of the device generally cannot be 473 K or higher, SOI Lateral PD-GTE can be used at high temperature with good sensitivity.

Bin Wang, Yun Zeng, Guoli Li, Yu Xia, Hui Xu, Caixia Huang

Mitigation Techniques Against TSV-to-TSV Coupling in 3DIC

TSV in 3DIC introduces a large and fickle parasitic capacitance inevitably, causing serious problems on Power/Signal Integrity (P/SI). In this paper, we give two methods to mitigate TSV-to-TSV coupling, which are buffer insertion and shield insertion. The effect of the buffer insertion and shield insertion are studied by comparison experiment, and the experiment results have proved that these two methods can reduce the coupling capacitance effectively. Factors as location, number and drive capability of buffers in this course are also discussed. TSV-to-TSV coupling reduces by 99 % at maximum. Through combining the two method, we can get a low cost and effective optimization for reduction of TSV-to-TSV coupling in consideration of actual design restrain, which can also be utilized in EDA tools.

Quan Deng, Minxuan Zhang, Zhenyu Zhao, Peng Li

Backmatter

Weitere Informationen

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

INDUSTRIE 4.0

Der Hype um Industrie 4.0 hat sich gelegt – nun geht es an die Umsetzung. Das Whitepaper von Protolabs zeigt Unternehmen und Führungskräften, wie sie die 4. Industrielle Revolution erfolgreich meistern. Es liegt an den Herstellern, die besten Möglichkeiten und effizientesten Prozesse bereitzustellen, die Unternehmen für die Herstellung von Produkten nutzen können. Lesen Sie mehr zu: Verbesserten Strukturen von Herstellern und Fabriken | Konvergenz zwischen Soft- und Hardwareautomatisierung | Auswirkungen auf die Neuaufstellung von Unternehmen | verkürzten Produkteinführungszeiten
Jetzt gratis downloaden!

Bildnachweise