Skip to main content

2017 | Buch

VLSI Design and Test

21st International Symposium, VDAT 2017, Roorkee, India, June 29 – July 2, 2017, Revised Selected Papers

herausgegeben von: Brajesh Kumar Kaushik, Sudeb Dasgupta, Virendra Singh

Verlag: Springer Singapore

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 21st International Symposium on VLSI Design and Test, VDAT 2017, held in Roorkee, India, in June/July 2017.
The 48 full papers presented together with 27 short papers were carefully reviewed and selected from 246 submissions. The papers were organized in topical sections named: digital design; analog/mixed signal; VLSI testing; devices and technology; VLSI architectures; emerging technologies and memory; system design; low power design and test; RF circuits; architecture and CAD; and design verification.

Inhaltsverzeichnis

Frontmatter

Digital Design

Frontmatter
Flexible Composite Galois Field Multiplier Designs

Composite Galois Field $$GF((2^m)^n)$$ multiplications denote the multiplication with extension field over the ground field $$GF(2^m)$$, that are used in cryptography and error correcting codes. In this paper, composite versatile and vector $$GF((2^m)^2)$$ multipliers are proposed. The proposed versatile $$GF((2^m)^2)$$ multiplier design is used to perform the $$GF((2^x)^2)$$ multiplication, where $$2\le x\le m$$. The proposed vector $$GF((2^m)^2)$$ multiplier design is used to perform $$2^k$$ numbers of $$GF((2^{\frac{m}{2^k}})^2)$$ multiplications in parallel, where throughput is comparatively higher than other designs and $$k\in \{0, 1, ...(log_{2}m)-1) \}$$. In both the works, the hardware cost is the trade-off while the flexibility is high. The proposed and existing multipliers are synthesised and compared using 45 nm CMOS technology. The throughputs of the proposed parallel and serial vector $$GF((2^8)^2)$$ multipliers are $$72.7\%$$ and $$53.62\%$$ greater than Karatsuba based multiplier design [11] respectively.

M. Mohamed Asan Basiri, Sandeep K. Shukla
Estimating the Maximum Propagation Delay of 4-bit Ripple Carry Adder Using Reduced Input Transitions

Adders are invariably present in arithmetic units, and they are needed for implementing the operations: addition/subtraction, multiplication, division, etc. Due to the crucial role of adder in arithmetic unit, it is necessary to satisfactorily characterize the maximum propagation delay of the adder. To characterize 4-bit Ripple Carry Adder (RCA), ideally 261,632 input transitions are required [1], which is a humongous number. In this paper, we have proposed a method to estimate maximum propagation delay of 4-bit RCA, using only 44 input transitions (applied as primary-secondary and subsequently as secondary- primary). We applied our proposed method on 4-bit RCAs designed using seven different Full Adder (FA) circuits and simulated them in LTspice. The results from our proposed method (reduced input transitions) are compared with the results obtained by applying 261,632 input transitions (all possible transitions) to the 4-bit RCA. The simulation results prove that the maximum delay estimated by our proposed method is very close to the exact maximum delay of 4-bit RCA (found by applying ideal 261,632 input transitions), and has maximum 5.99% deviation.

Manan Mewada, Mazad Zaveri, Anurag Lakhlani
VLSI Implementation of Throughput Efficient Distributed Arithmetic Based LMS Adaptive Filter

A new throughput efficient implementation scheme for least mean square (LMS) adaptive filter using distributed arithmetic (DA) is presented for IEEE 802.11b PHY scenarios. It is based on pre-computing and storing the filter partial products in lookup tables (LUTs). In contrast to fixed coefficients filter, an adaptive filter requires each stored partial product to be updated time-to-time. This paper presents a new strategy for DA based adaptive filter using offset binary coding (OBC) technique. The proposed strategy eliminates two oldest sample and allows possible decomposition of LUT into four sub-LUTs. Hence, the proposed approach provides significant improvement in throughput at the cost of few 2-to-1 multiplexers. Synthesis results have shown that the proposed scheme occupies almost similar area and improves the throughput by several fold. For instance, a 32- tap adaptive filter with the proposed implementation produces nearly 1.8 MSPS (million samples per second) more throughput as compared to the best existing scheme.

Mohd. Tasleem Khan, Shaik Rafi Ahamed
Realization of Multiplier Using Delay Efficient Cyclic Redundant Adder

Digital Adders and Multipliers are the backbone of Digital Signal Processing systems. A novel adder which uses Recursive Doubling technique for carry generation is propounded in this paper. A Multiplier based on Quarter square algorithm is designed and implemented using the proposed Cyclic Redundant Adder on Field Programmable Gate Array. The proposed Cyclic Redundant adder is compared amongst the recent high performance adders like Ling Adder, Carry Shifting Adder with carry increment and Carry Look Ahead Adder. The Cyclic Redundant adder has been observed to be the fastest with the least time delay of 2.719 ns for 64 bit input.

K. Dheepika, K. S. Jevasankari, Vippin Chandhar, Binsu J. Kailath
Fast Architecture of Modular Inversion Using Itoh-Tsujii Algorithm

Modular inversion is a very common primitive used for the cryptographic computations. It is the most computation intensive unit which demands more resources as compared to other primitives. Inside the modular inversion arithmetic circuits, considerable speed up with optimized architecture is required. This paper proposes an optimized parallel architecture for Itoh-Tsujii modular inversion algorithm for the field GF(2256) by introducing 23 blocks. The comparative results with conventional architecture show the 30% reduction in LUT requirement with 37% in combinational delay.

Pravin Zode, R. B. Deshmukh, Abdus Samad
Performance Optimized 64b/66b Line Encoding Technique for High Speed SERDES Devices

The 64b/66b technique conventionally is suited for low BER fiber optic channels, but can be extended for higher BER channels by including proper error correcting code and preamble. A modified 64b/66b line encoding technique for the design of high speed SERDES is proposed. Unlike earlier 8b/10b technology, run-length is no more guaranteed but is statistically bound. Generated polynomials are statistically tested in MATLAB prior VHDL implementation. Optimal selection of primitive polynomial limits run length to 11 and provides sub-optimal data security. Proposed 64/66b encoding technique reduces overhead by 15.8% (at 6.3% CRC) with respect to conventional 8b/10b, while is also suited for high BER channels like wireless and free space. A performance optimum between security, run-length, ISI and DC equalization, this scheme finds potential application in space camera electronics, 5G technology and other IOT applications like driverless cars that require to handle large volumes of real time data with sufficient security on high BER wireless channels.

Jatindeep Singh, Satyajit Mohapatra, Nihar Ranjan Mohapatra
A New Multi-objective Hardware-Software-Partitioning Algorithmic Approach for High Speed Applications

Designing embedded systems efficiently has always been of significant interest. This has tremendously scaled-up for contemporary applications with their increasing complexity and the need to satisfy multiple conflicting constraints. This paper presents a high-speed Hardware Software Partitioning (HSP) technique for the design of such systems. The Partitioning problem has been modeled as a multi-dimensional optimization problem with the aim of minimizing the area utilization, power dissipation, time of execution and system memory requirement of the implementation. A two-phased algorithm has been proposed which also takes into consideration the communication costs between hardware and software Processing-Engines (PEs) while partitioning. Detailed empirical analysis of the proposed algorithm is presented to ascertain its efficiency, quality and speed.

Naman Govil, Rahul Shrestha, Shubhajit Roy Chowdhury
A Framework for Branch Predictor Selection with Aggregation on Multiple Parameters

The performance of a branch predictor is measured not only by the prediction accuracy - parameters like predictor size, energy expenditure, latency of execution play a key role in predictor selection. The task of selecting the best predictor considering all the different parameters, is therefore, a non-trivial one, and is considered one of the foremost challenges. In this paper, we present a framework that systematically addresses this important challenge using the concept of aggregation and unification and makes a predictor selection based on different parameters. We present experimental results of our framework on the Siemens and SPEC 2006 benchmarks.

Moumita Das, Ansuman Banerjee, Bhaskar Sardar
FPGA Implementation of a Novel Area Efficient FFT Scheme Using Mixed Radix FFT

In the literature, mixed radix FFT scheme has been proposed to facilitate the computation of FFT in parallel using multiple lower radix FFT modules. Alternately, the speed of the FFT can be increased using Radix-2 decimation-in-frequency (DIF) FFT algorithm with Multipath Delay Commutator (R2MDC) architecture. In this paper, a novel FFT scheme which combines the R2MDC architecture with the serial version of mixed radix FFT scheme is proposed. To study the efficacy of this approach, an 8-point FFT is implemented using R2MDC architecture. Using this, 16-point, 32-point and 64-point FFTs are realized with the serial version of mixed radix scheme and also using only R2MDC architecture on Xilinx Virtex-5 FPGA. From the implementation results, it is found that the hardware requirement for the proposed approach reduces by 25%–53% at the cost of speed compared to the other schemes reported in the literature including that using only R2MDC architecture. The proposed scheme is preferred for low sampling rate applications such as biomedical signal processing.

Thilagavathy R, Susmitha Settivari, Venkataramani B, Bhaskar M

Analog/Mixed Signal

Frontmatter
Low Voltage, Low Power Transconductor for Low Frequency -C Filters

A low voltage, low power bulk-driven transconductor for low frequency Transconductance-C ($$G_m-C$$) filters is proposed. The transconductor is designed in UMC 180 nm technology with supply voltage of 0.5 V. The transconductance ($$G_m$$) is tunable from 12 nS to 100 nS, which is suitable for low frequency $$G_m-C$$ filters. The power consumption is 120 nW. As an application, a $$2^{nd}$$ order Butterworth low pass filter (LPF) with cutoff frequency tunable from 110 Hz to 960 Hz is designed.

Hanumantha Rao G., Rekha S.
An Improved Highly Efficient Low Input Voltage Charge Pump Circuit

Conventional charge pump circuit based on dynamic charge transfer switch (CTS) is limited by its efficiency due to the threshold voltage of MOS transistor. This paper proposes an improved dynamic CTS based charge pump circuit by modifying the conventional circuit architecture at the output stage by a PMOS transistor with appropriate control signals. A four-stage dynamic CTS based charge pump circuit with pumping capacitance of 50 pF, clock frequency of 20 MHz and load current of 100 µA is designed and simulated in Cadence environment using UMC 0.18 µm CMOS technology. As compared to conventional architecture, this modification has reduced the voltage loss at the output to 1.3% as compared to 9% for 1 V input and 6% as compared to 20% for 0.3 V input voltage. The core dimension of the layout is 750 µm × 530 µm.

Naresh Kumar, Raja Hari Gudlavalleti, Subash Chandra Bose
A Calibration Technique for Current Steering DACs - Self Calibration with Capacitor Storage

High resolution DACs require large transistors to obtain the desired accuracy according to the Pelgrom model [1], which increases the area drastically. To overcome this area accuracy trade off, several calibration techniques were investigated. This paper presents a modified self calibration technique for current-steering (CS) digital-to-analog converters (DACs). In the digital calibration technique calibrating DACs (CALDACs) are connected across each bit, which requires calibration. High resolution CALDAC increases the accuracy at a cost of increment in the area. To overcome this problem, this technique is slightly modified. Instead of using CALDAC of 6 or 8 bits across each bit, here a single CALDAC is used to calibrate each bit, and its equivalent calibrated value in terms of analog voltage is stored across the capacitor (instead of within SRAM memory in digital form), which is connected in the place of CALDAC by using an extra-auxiliary transistor. MOSFET as a switch is used for simultaneous switching and to hold the correct voltage after turning off switches, injection nulling switch type track and hold circuit is used. To demonstrate this technique, a 10-bit binary-weighted CS DAC is implemented in a 0.18 $$\upmu $$m CMOS process. With worst-case process parameter variations, simulated integral and differential nonlinearities of the calibrated DAC are less than 0.32 LSB.

Pallavi Darji, Chetan Parikh
Characterization and Compensation Circuitry for Piezo-Resistive Pressure Sensor to Accommodate Temperature Induced Variation

The paper presents a simple circuit for piezo-resistive pressure sensors which compensates the temperature dependency of sensors. The output of piezo-resistive sensors generally, decreases with the increase in temperature when subjected to constant voltage excitation. To control the change with temperature, a varying excitation method is used. The proposed technique utilizes current steering DACs and a digital controller to compensate the variations. The technique is experimentally verified at hardware level where the digital control circuit is implemented on FPGA and tested with ASICs comprising of interface circuit. For the purpose of compensation, temperature is sensed using the same sensor. The temperature resolved is less than 1 °C for a range of 10 °C to 70 °C with zero pressure correction technique. The test results for implementation show that the sensitivity and offset shift is compensated by a factor of 10 and 44 respectively. The complete fabricated chip, consisting of interface circuit and algorithm occupies 10 mm2 area.

M. Santosh, Anjli Bansal, Jitendra Mishra, K. C. Behra, S. C. Bose
FEM Based Device Simulator for High Voltage Devices

TCAD simulation of electronic device has always been the basic approach to understand solid state electronics and to frame road-map for the evolution of future technology. Design of devices on these materials require better understanding of the physical insights to the internals of the device structure. In such a scenario, TCAD tool can help to visualize internal dynamics of carriers and fields in the device structure, thus helping to improve them further. Device structures are evolving continuously leading to an increase in complexity of computation of simulation. There is an increasing challenge to these simulators to improvise compact device models, whereby generating precise results. The responsibility of TCAD designers is ever increasing to develop improved solvers featuring better predictive capabilities. In this work, an effort has been made to compare the performance of an FEM based proposed simulator with conventional available device simulator. A simple pn junction diode is designed in both the simulators and a comparison of different electrical properties has been done by incorporating similar models and exactly same material parameters.

Ashok Ray, Gaurav Kumar, Sushanta Bordoloi, Dheeraj Kumar Sinha, Pratima Agarwal, Gaurav Trivedi
Synapse Circuits Implementation and Analysis in 180 nm MOSFET and CNFET Technology

Neuromorphic hardware circuits and systems emulate the operational and organizational principles of biological networks. The basic components building up these large-scale networks are the neurons and the synapses. The synapses serve as interconnections between the neurons for computation and transfer of information in real as well as artificial neural systems. Synapses in the neuronal networks can be static with a constant gain or dynamic with modification in the synaptic strength during computation. In short term dynamical plastic synapses, the synaptic strength changes in the time scale of milliseconds to minutes and the change is reversible. The short term dynamic synapses can be both depressing, when synaptic strength decreases, or facilitating when it increases. In this paper, we have worked on a static synapse and a short term dynamical depressing synapse circuit already reported in literature. We have ported these circuits to 180 nm MOSFET technology and CNFET technology and studied their response in terms of their functionality, the average power consumption and area occupancy. The simulations in this work have been carried out using HSPICE software.

Sushma Srivastava, S. S. Rathod
A 10 MHz, 73 ppm/°C, 84 µW PVT Compensated Ring Oscillator

A 10 MHz, 84 μW PVT compensated ring oscillator is presented in 0.11 μm BCD9S (Bipolar CMOS DMOS) technology. The proposed ring oscillator is inherently temperature compensated and produces a frequency deviation of ±0.7% in typical corner, ±2.25% in slow corner and ±0.75% in fast corner around 10 MHz across −40 °C to 150 °C at a regulated supply of 1.8 V. The proposed oscillator exhibits less sensitivity to PVT variations and requires less area when compared to the state-of-the-art oscillators.

Vivek Tyagi, M. S. Hashmi, Ganesh Raj, Vikas Rana

VLSI Testing

Frontmatter
Deterministic Shift Power Reduction in Test Compression

Over the years semiconductor design complexities have increased to multi million gates. With increase in design sizes, power consumption saving has become a key challenge. The power consumption in test modes is found to be higher, as all the logic blocks are used simultaneously. Some techniques to save test mode power during shift and capture cycles are already in use. But the existing techniques are not deterministic and does not provide user control mechanism. This paper proposes a mechanism called Shift Power Chain (SPC) to deterministically control and reduce shift power in test compression mode. Our mechanism provides significant reduction in peak and average shift power. We present the experimental results on large scale industrial designs as well as ISCAS’89 and Opencore benchmarks.

Kanad Basu, Rishi Kumar, Santosh Kulkarni, Rohit Kapur
Pseudo-BIST: A Novel Technique for SAR-ADC Testing

This paper presents an improved approach for testing and measuring the different parameters of an Analog to Digital Converter (ADC). The proposed methodology Pseudo-BIST is a combination of ATE (Automatic Test Equipment) and BIST (Built-In Self-Test). Pseudo-BIST provides a novel multi-processing technique where data conversion and calculation of static parameters takes place at the same instant. The proposed method has been applied to a SAR-ADC with test time reduction of more than 76% for a single site SAR-ADC and 93% reduction in time for 8 site ADCs. Pseudo-BIST also achieves a 50–70% reduction in area overhead as compared to BIST consisting of a high precision DAC.

Yatharth Gupta, Sujay Deb, Vikrant Singh, V. N. Srinivasan, Manish Sharma, Sabyasachi Das
SFG Based Fault Simulation of Linear Analog Circuits Using Fault Classification and Sensitivity Analysis

This paper presents a new approach for simulating single analog faults in linear analog circuits modeled in MATLAB/Simulink environment. The proposed approach consists of namely, modeling fault free, and faulty analog circuits in MATLAB/Simulink environment using signal flow graph, simulating both models applying an input test stimulus and identifying the presence of faults by comparing the maximum error voltage measured at the output with a predefined threshold. The parametric faults are modeled in terms of component tolerances. Apart from this, catastrophic faults are also considered in terms of short and open faults. The proposed approach initially identifies the type of fault to be simulated. Based on fault type, it builds the signal flow graph (SFG) of the faulty circuit. We have also shown that parametric faults are sensitive to frequency of the input test sinusoid. Our proposed approach exploits ‘PSpice® Advanced Sensitivity Analysis’ which identifies less sensitive parametric faults at a given frequency of input sinusoid. The effectiveness of the proposed method is verified by fault modeling and fault simulation of a biquadratic and leapfrog filter circuit. The proposed fault simulation method provides a speedup over the traditional circuit simulator like PSPICE.

Rahul Bhattacharya, S. H. M. Ragamai, Subindu Kumar
A Cost Effective Technique for Diagnosis of Scan Chain Faults

Scan based diagnosis plays a critical role in failure mode analysis for yield improvement. However, as the logic circuitry associated with scan chains constitute a significant fraction of a chip’s total area the scan chain itself can be subject to defects. In some cases, it has been observed that scan chain failures may account up to $$50\%$$ of total chip failures. Hence, scan chain testing and diagnosis have become very crucial in recent years. This paper proposes a hardware-assisted low complexity and area efficient scan chain diagnosis technique. The proposed technique is simple to implement and provides maximum diagnostic resolution for stuck-at faults. The proposed technique can be further extended to diagnose scan chain’s timing faults.

Satyadev Ahlawat, Darshit Vaghani, Jaynarayan Tudu, Ashok Suhag
Multi-mode Toggle Random Access Scan to Minimize Test Application Time

Random Access Scan (RAS) as a design-for-test technique gained importance recently with the ability to update each flip-flop independently. Thus, with this ability, the test application time reduces drastically in comparison to the traditional Serial Scan technique. In this paper, we have proposed a Multi-Mode Toggle RAS architecture that reduces the test application time using the T-Flip-Flop based cell design. More importantly, the proposed RAS architecture gives the ability to update multiple flip-flops together thereby leading to a reduction in test application time. In the proposed RAS architecture, there are two modes of operation in case of test mode. In direct test mode, multiple flip-flops will be toggled together, however, in the decoder test mode only one flip-flop will be toggled at a time. An algorithm for the placement of scan flip-flops is also proposed for optimal performance in the proposed architecture. Experimental results show an average of 56% reduction in test data volume as compared to the traditional RAS architecture. Also, on an average, a speedup of 2.7x in test application time is achieved.

Anshu Goel, Rohini Gulve
Performance Analysis of Disability Based Fault Tolerance Techniques for Permanent Faults in Chip Multiprocessors

Dynamic Voltage and Frequency Scaling (DVFS) for reducing power dissipation in Multicore Chips causes cell failure in Cache Memory. Various fault tolerance techniques have been introduced and the analysis of their impacts becomes necessary. Keeping the lowest overhead of Disabling techniques in mind, this work attempts to analyse its performance in Multicore Chips. The parameter Expected Miss Ratio for Multicore $$(EMR_{MC})$$ as a function of Probability of Cell Failure ($$P_{fail}$$) is proposed and evaluated. Simulation on Singlecore and Multicore system configuration is done separately to compare the results. It is observed that the Expected Miss Ratio is hardly affected below the lower bound of $$P_{fail}$$ i.e. 1e-5 where $$EMR_{MC}$$ remains lower than Expected Miss Ratio for Singlecore($$EMR_{SC}$$) with a static difference. Above the lower bound, both $$EMR_{SC}$$ and $$EMR_{MC}$$ starts increasing and for $$P_{fail}$$ higher than 1e-3 i.e. the upper bound, $$EMR_{MC}$$ often converges with $$EMR_{SC}$$. Within these bounds, $$EMR_{MC}$$ remains up to 19.3% lower than the $$EMR_{SC}$$.

Avishek Choudhury, Biplab K. Sikdar

Devices and Technology – I

Frontmatter
Low-Power Sequential Circuit Design Using Work-Function Engineered FinFETs

Sequential circuits like pulsed latches and semi-dynamic flip-flops are extensively used in state-of-the-art high performance microprocessors. In this paper, we proposed a novel approach of exploiting the metal gate workfunction to reduce the power consumption and area of the pulsed latches and semi-dynamic flip-flops made using FinFETs. Compared to the design using standard shorted gate FinFETs, the proposed pulsed latch reduces the dynamic and leakage power by 37% and 42% respectively. Similarly, the proposed semi-dynamic flip-flop shows a reduction of 24% and 32% respectively in dynamic and leakage power consumption compared to the standard design. The proposed circuits also show significant improvement in static noise margin and reduction in area.

Ashish Soni, Abhijit Umap, Nihar R. Mohapatra
Vertical Nanowire FET Based Standard Cell Design Employing Verilog-A Compact Model for Higher Performance

In sub 10 nm technology node, vertical silicon nanowire (VNW) FET device has become a promising substitute due to its better gate controllability, short channel immunity, high ION/IOFF ratio and CMOS compatibility. This paper presents, a standard cell library using physics based Verilog-A compact model for 10 nm vertical SiNW FET device. A unified compact model included all the nanoscale effects (e.g. short channel effects, mobility degradation, velocity saturations etc.) as well as the parasitic capacitance and resistance model, which are highly dominant in lower technology nodes. The compact model is well matched with TCAD simulation data at 10 nm VNW FET device level. The cell library builds comprises of INVERTER, NAND, NOR and Ex-OR gate cells. Further, we compared the 10 nm VNW FET based standard cell performance to 45 nm bulk CMOS based standard cell library. It is found that the VNWFET based cells library design have an advantage of delay by ~4X and power consumption by ~14X against the 45 nm CMOS technology.

Satish Maheshwaram, Om Prakash, Mohit Sharma, Anand Bulusu, Sanjeev Manhas
Analysis of Electrolyte-Insulator-Semiconductor Tunnel Field-Effect Transistor as pH Sensor

In this paper, an analysis of Silicon on Insulator (SOI) Electrolyte Insulator Semiconductor (EIS) Tunnel Field Effect Transistor (TFET) has been investigated for pH sensing application using 3-D device simulator “Sentaurus”. The electrolyte region has been considered an intrinsic semiconductor material in which the electron and hole charges represent the mobile ions in the aqueous solution. The dielectric constant, energy bandgap and electron affinity of electrolyte region are 78, 1.5 eV and 1.32 eV respectively. The effect of pH has been examined on the device electrostatics such as, surface potential, threshold voltage and drain current. The pH response is defined as the amount of threshold voltage shift when the pH (in the injected solution) is varied from lower to higher values.

Ajay Singh, Rakhi Narang, Manoj Saxena, Mridula Gupta
Exploiting Characteristics of Steep Slope Tunnel Transistors Towards Energy Efficient and Reliable Buffer Designs for IoT SoCs

Energy efficient buffer circuits enable high speed and reliable information transfer among sub-systems of System on Chip (SoC). A novel buffer circuit design exploiting the steep slope characteristics of tunnel FETs (TFET) has been proposed and benchmarked with 20 nm Si FinFET technology. The analysis is performed considering the parameters such as iso-area, iso-energy, iso-speed and noise margins for energy efficiency and reliability. It is clearly evident that TFET buffers exhibit improved speed of operation and high energy efficiency over FinFET buffers for scaled supply voltages, demonstrating suitability for applications such as Internet of things (IoT) SoCs. To further exemplify the buffer circuit performance, TFET/FinFET pass transistor based full adder carry circuit is implemented whose output load is driven by TFET/FinFET buffer. Unlike FinFET buffer circuits, TFET buffers prove to be reliable and energy efficient in driving larger loads despite the area overhead caused due to the unidirectional current conduction of TFETs.

Japa Aditya, Vallabhaneni Harshita, Ramesh Vaddi
An Efficient VLSI Architecture for PRESENT Block Cipher and Its FPGA Implementation

Lightweight cryptography plays an essential role for emerging authentication-based pervasive computing applications in resource-constrained environments. In this paper, we have proposed resource-efficient and high-performance VLSI architectures for PRESENT block cipher algorithm for the two key lengths 80-bit and 128-bit, namely PRESET-80 and PRESENT-128. The FPGA implementations of these architectures have been done on LUT-6 technology based Xilinx Virtex-5 XC5VFX70T-1-FF1136 FPGA device. These architectures have a latency of 33 clock cycles, run at a maximum clock frequency of 306.84 MHz and provide throughput of 595.08 Mbps. They have been compared with the two different established architectures. It has been observed that the PRESENT-80 architecture consumes 20.3% lesser FPGA slices and there is a gain of 25.4% in throughput. Similarly, the PRESENT-128 architecture requires 20.7% lesser FPGA slices alongwith a reduction in the latency by 27.7% and an overall increase of throughput by 69.1%.

Jai Gopal Pandey, Tarun Goel, Abhijit Karmakar
Investigation of TCADs Models for Characterization of Sub 16 nm InGaAs FinFET

At sub 16 nm In$$_{0.53}$$Ga$$_{0.47}$$As FinFET technology node the fabrication of device is quite complex in many sense. The study of such devices is only possible through TCAD simulations. To understand the behavior of such device the TCAD tool has to incorporate various simulation models related to physics of semiconductor and device geometry. In this paper, we have calibrated 50 nm In$$_{0.53}$$Ga$$_{0.47}$$As FinFET using various simulation models with experimental results and then same models are used to characterize $$I_{d}-V_g$$ and $$I_{d}-V_{d}$$ characteristics and along with the short channel parameters for the sub 16 nm In$$_{0.53}$$Ga$$_{0.47}$$As FinFET. The analysis is done on two types of devices i.e. Raised S/D with nitride spacers and without nitride spacers. Subthreshold slope SS (mV/dec) and DIBL (mV/V) for raised S/D In$$_{0.53}$$Ga$$_{0.47}$$As FinFET with spacers is measured as 65.48 and 38.4 respectively, while without spacers it is 84.45 and 44.

J. Pathak, A. Darji
Hausdorff Distance Driven L-Shape Matching Based Layout Decomposition for E-Beam Lithography

Layout decomposition is a basic step in mask data preparation in e-beam lithography (EBL) writing. For larger throughput in EBL, L-shape-writing technique has recently been developed. It is termed as L-shape fracturing, similar in line with rectangular fracturing. However, implementation of this new technique may yield very thin/narrow features called slivers. For better manufacturability, it is preferable to minimize the overall sliver length. In this paper we propose a novel scheme based on Hausdorff distance metrics for L-shape fracturing with inherent sliver minimization. The proposed scheme starts with finding the concave corner vertices of input layout, and attempts to find a balanced partition of this set of concave corner points of the given layout. Subsequently, Hausdorff distance-based layout fracturing is performed. Experimental results demonstrate efficacy of our proposed algorithm.

Arindam Sinharay, Pranab Roy, Hafizur Rahaman

VLSI Architectures

Frontmatter
Energy-Efficient VLSI Architecture & Implementation of Bi-modal Multi-banked Register-File Organization

For the execution of high-end applications of present-day scenario, processor consumes profound energy and its significant fraction is due to intensive register-file access in the processor architecture. Such fraction of energy required by the processor defers to reduce with the advancement of semiconductor technology and thereby, it is essential to design energy-efficient register-file architecture for the contemporary scenario. This paper presents new register-file architecture called the bi-modal multi-banked register-file organization to capture short term reused and short lived operands to alleviate load on register file to read and write. Additionally, instruction decode stage of the processor architecture is restructured to capture the reused and short lived register operands. On incorporating these new features, we have conceived a processor architecture that has been synthesized and post-layout simulated in 180 nm complementary metal-oxide semiconductor (CMOS) technology node. It consumes 35 mW of total power at 200 MHz of clock frequency. The bi-modal multi-banked register-file organization stores a fraction of data bandwidth, which is local to the functional units, resulting in the reduction of cost for supplying data to the execute stage. Subsequently, the proposed architecture is made to execute MiBench benchmark kernels and it showed up to 55% improvement in energy saving over an embedded reduced instruction-set computer (RISC) processor architecture.

Sumanth Gudaparthi, Rahul Shrestha
Performance-Enhanced -LBDR for 2D Mesh Network-on-Chip

Growing demand for high-performance computing is necessitating faster on-chip communication. Network-on-Chip (NoC) with networking theory and methods for faster on-chip communication has emerged as a potential option. Due to transistor scaling down to sub-micron technologies, NoC also suffers from permanent or transient failures. Logic Based Distributed Routing (LBDR) has been proposed as a flexible fault tolerant routing implementation framework for Mesh-Based NoCs with link and router faults. The routing logic overhead remains invariant to the size of the topology making it scalable. LBDR is restricted to provide only minimal paths and can not support all failures. $$d^2$$-LBDR was developed to support non-minimal paths and thus would handle all single and double link permanent failures. Though, $$d^2$$-LBDR successfully covers all single and double link permanent failures but still restricts the available number of fault-free paths. In this paper, we present how this limitation on the available number of fault-free paths affects NoC performance. Based on our analysis, we present a new selection logic which enhances $$d^2$$-LBDR to explore all available fault-free paths. Our proposed solution having a marginal overhead in area and power provides higher performance ($$7\%$$ improvement in average flit latency and $$4\%$$ improvement in average network throughput when subject to two link faults in a 64-Node NoC).

Anugrah Jain, Vijay Laxmi, Meenakshi Tripathi, Manoj Singh Gaur, Rimpy Bishnoi
ACAM: Application Aware Adaptive Cache Management for Shared LLC

Modern Chip Multiprocessors (CMPs) are typically multicore systems with shared last level cache (LLC). Effective utilization of the shared cache resource can be a challenge when the demands of competing applications conflict with each other. At times, in order to accommodate new data required by one application, the other application’s useful data may get evicted. Such negative interference results into increase in memory miss and degrades system’s performance. Hence, a technique is required which optimally manages the LLC even in the presence of such conflicting demands.Various LLC management techniques have been proposed to efficiently manage shared caches. The state-of-the-art replacement policies like Static Re-reference Interval Prediction (SRRIP) and Application Aware Behavior Re-reference Interval Prediction (ABRip) evict a cache block based on their re-usability in the near future. SRRIP makes the replacement decisions per block basis whereas ABRip also considers the cache behavior of an application to minimize conflicting data demands. Hence, ABRip outperforms SRRIP for workload mixes where one application is cache friendly, and the other one is streaming. However, ABRip does not perform well when the workload mix is Cache friendly-Cache friendly. We propose Application Aware Adaptive Cache Management policy that adapts to both types of workload mixes. The proposed replacement policy reduces LLC misses per kilo instruction (mpki) up to 22.74% and 12.7% compared to SRRIP and ABRip respectively on a CMP system running SPEC CPU2006 workloads. Our policy effectively utilizes the shared LLC and outperforms both SRRIP and ABRip with performance gains of up to 10.12% and 9.36% respectively.

Sujit Kr Mahto, Newton
Adaptive Packet Throttling Technique for Congestion Management in Mesh NoCs

Network on Chip is an emerging communication framework for multi-core systems. Due to increasing number of cores and complex workloads, congestion management techniques in NoC are gaining more research focus. Packet throttling is one of a cost effective technique for congestion management. It delays the packet injection into the network, thereby regulating traffic in network and hence provide ease of packet movement generated by other critical applications. Finding point of throttling and rate of throttling are two major design issues that can impact the performance and stability of any throttling algorithm. Existing state of the art throttling techniques use local throttling decision coordinated by a single central controller. We overcome the issues related with this by partitioning the network into number of subnetworks, each with a zonal controller. Our experiment results in 8 $$\times $$ 8 2D mesh with real traffic workloads consisting of SPEC 2006 CPU benchmarks shows an average packet latency reduction of 6.2% than the state of the art packet throttling techniques.

N. S. Aswathy, R. S. Reshma Raj, Abhijit Das, John Jose, V. R. Josna
Defeating HaTCh: Building Malicious IP Cores

Possibility of Hardware Trojans (HT) being present in SOCs designed by integrating hundreds of third party IP (3PIP) cores provided by different vendors is well documented. Our focus in this paper is to highlight the vulnerability of such SOCs to HTs. We achieve this by demonstrating a novel approach to the design of a simple and extremely small footprint HT. We present a detailed discussion to demonstrate that HaTCh, one of the latest and best HT detection technique, will fail to detect our Trojan. The paper concludes by highlighting the vulnerabilities of SOCs designed with 3PIP cores and need for trusted IP cores.

Anshu Bhardwaj, Subir Kumar Roy
Low Cost Circuit Level Implementation of PRESENT-80 S-BOX

PRESENT-80 algorithm is based on Substitution-Permutation Network (SPN) with a data-size of 64-bits and key-size of 80-bits. While the permutation operation can be performed by simple wiring, Substitution operation (S-box) is the only non-linear component consuming maximum resources. The existing works in literature concentrate on the algorithmic implementation of PRESENT. This work is the first of its kind to explore the circuit level implementation of PRESENT algorithm by identifying an optimized architecture for the S-box. This is achieved by realizing the PRESENT S-box using static CMOS logic styles in 180 nm technology. Comparison results of two different architectures of PRESENT S-box using the static CMOS logic styles is tabulated.

S. Shanthi Rekha, P. Saravanan

Emerging Technologies and Memory

Frontmatter
Modeling and Analysis of Transient Heat for 3D IC

Three dimensionally integrated circuit (3D IC) is a promising technology in semiconductor industry. 3D IC provides several benefits over the conventional 2D IC. However, thermal issues are major concern due to high power density. So, thermal management is a challenging task for 3D IC. This paper presents a new thermal model for calculating the temperature of a 3D IC accurately. The model is simulated for 3D ICs to study the effects of various parameters like the thermal conductivities of the interface sub-layers, heat sink, power dissipation etc. on temperature of the IC. It is also observed how these parameters affect the transient thermal behavior of the IC.

Subhajit Chatterjee, Surajit Kr. Roy, Chandan Giri, Hafizur Rahaman
Memory Efficient Fractal-SPIHT Based Hybrid Image Encoder

Hardware implementation of hybrid coder based on fractal and SPIHT image compression technique is presented in this paper. Time complexity of fractal image encoder is improved and the desired image quality at varying bit rates is achieved as a result of this hybridization. LL subband of the wavelet transformed image is used for the fractal encoding activity and other sub-bands are operated with the SPIHT encoder. In this work both the image compression techniques are analyzed and performance of this technique is tested over different test images. This architecture operates at real time and can encode a $$256 \times 256$$ image within 7 ms.

Mamata Panigrahy, Nirmal Chandra Behera, B. Vandana, Indrajit Chakrabarti, Anindya Sundar Dhar
Metal-Oxide Nanostructures Designed by Glancing Angle Deposition Technique and Its Applications on Sensors and Optoelectronic Devices: A Review

Glancing angle deposited (GLAD) metal-oxide nanostructure films are promising materials for sensors and optoelectronic devices application due to the easy fabrication process, structural dependent properties and a large surface to volume ratio. This paper focuses on the literature reviews of metal-oxide nanostructures deposited by GLAD using all the possible deposition techniques such as thermal/electron-beam evaporation, sputtering magnetron, and pulsed laser deposition. The principle behind the formation of nanostructure through GLAD has also been discussed in details. The detailed analysis of the devices and their principle based on GLAD deposited metal-oxide nanostructures for different optoelectronic and sensor devices are also presented. This literature review will be helpful to understand and explore more on the growth of metal-oxide nanostructures using glancing angle deposition technique for futuristic sensors and optoelectronic device applications.

Divya Singh
Low Write Energy STT-MRAM Cell Using 2T- Hybrid Tunnel FETs Exploiting the Steep Slope and Ambipolar Characteristics

Spin Transfer Torque Magnetic Random Access memory (STT-MRAM) is found to be one of the best candidates among all emerging non-volatile memories. High write energy is a bottleneck for CMOS based 1T and 2T STT-MRAM cells with scaling. To reduce the write energy of an STT-MRAM cell, a novel 2T Hybrid (Hetero-junction and Homo-junction) Tunnel Field Effect Transistor (TFET) based STT-MRAM cell has been proposed in this paper. The proposed 2T Hybrid TFET based STT-MRAM cell has less write energy and switching time due to TFET’s combined steep-slope and ambipolar characteristics in comparison to 1T/2T-FinFET, 1T/2T Hetero-junction TFET, 1T/2T homo-junction TFET based STT-MRAM cells.

Y. Sudha Vani, N. Usha Rani, Ramesh Vaddi
Enhancing Retention Voltage for SRAM

In modern integrated chips, most of the power consumption comes from the memory blocks. These memory blocks require high rail voltages due to limited noise margins. Hence, the aim of this work is to design an assist circuitry which allows reduction in the retention voltage and consequently reduces the power consumption of memory. We initially designed a stable SRAM cell in 65 nm CMOS technology along with read and write assist circuits. These assist circuits enabled reduction of operating voltages. Transient Voltage Collapse Write Assist (TVC-WA) improves the writability of the SRAM cell by reducing write latency by 44% and Worldline Under Drive Read Assist (WLUD-RA) allows improvement in read stability. Both the circuits allow reduction in the supply voltage of SRAM, thereby reducing its power dissipation.

Ankit Rehani, Sujay Deb, Suprateek Shukla
Comparison of SRAM Cell Layout Topologies to Estimate Improvement in SER Robustness in 28FDSOI and 40 nm Technologies

The impact of high energy particles in digital memory elements becomes important as technology scales down. The memory elements hold high density latches to store data and these latches are susceptible to disturbs due to particle strikes. The alpha particles, neutrons from cosmic rays may cause Single Event Upset (SEU) in memory cells. In this paper, we propose a method to estimate and compare SER robustness of different layout topologies of SRAM cell. We demonstrate that the radiation hardened layout topologies offer much better Soft Error Rate (SER) robustness compared to conventional layout of the 6-T SRAM cell in 28FDSOI and 40 nm technology. The analysis is done using ELDO simulator for a wide range of Linear Energy Transfer (LET) profiles of particle strikes.

Anand Ilakal, Anuj Grover
Improving the Design of Nearest Neighbor Quantum Circuits in 2D Space

Existing quantum circuits restrict qubit interactions to their neighboring qubits which has led to communication overhead. Recent papers have, thus, developed several optimization methods with respect to this constraint. However, most of the works have limited their synthesis methods to 1D quantum architecture and there exist only a few works for multidimensional quantum circuits yet. Thus, we have focused on qubit-to-qubit interactions over 2D grid to make efficient representations of quantum circuits. Not only we are designing 2D circuit, but also have developed a strategy to make such circuits NN (Nearest Neighbor) based one. Here, we have shown two ways to make a quantum circuit NN compliant in 2D plane. The first approach is a naïve one but the second technique that relied on qubit-to-qubit interactions make efficient representations by minimizing the usage of SWAP gates. We have tested several benchmarks over our developed strategy and also have compared the obtained results with recent developments.

Neha Chaudhuri, Chandan Bandyopadhyay, Hafizur Rahaman

Devices and Technology – II

Frontmatter
Delay and Frequency Investigations in Coupled MLGNR Interconnects

Multilayer Graphene Nano-ribbons (MLGNRs) have been considered as a potential solution to replace conventional Cu for next-generation on-chip interconnects. In this paper, analytical models of transfer gain and crosstalk are derived for coupled three-line MLGNR interconnects using ABCD modeling approach. For this purpose, an equivalent single conductor model of GNRs has been considered. Our proposed model takes into account the impact of mutual inductive and capacitive coupling among the adjacent interconnects. Using the proposed model, the bandwidth of MLGNRs has been determined. It is found that GNR interconnects exhibit higher bandwidth, lesser delay and power as compared to Copper counterparts. The impact of input switching, transition time and interconnect length on crosstalk delay has also been investigated. The proposed analytical results agree well with SPICE simulations.

Manish Joshi, Koduri Teja, Ashish Singh, Rohit Dhiman
LISOCHIN: An NBTI Degradation Monitoring Sensor for Reliable CMOS Circuits

Reliability and variability issues are the biggest design challenges facing nanoscale high-speed applications. Negative bias temperature instability (NBTI) is the major reliability issues with the scaled devices. Effect of NBTI increases with the time and it increases the threshold voltage of PMOS. This paper presents an NBTI degradation sensor which monitors the change in standby leakage current ($$I_{ddq}$$) of the test circuit under the stress conditions. The performance of proposed sensor is linear and highly sensitive. Due to high sensitivity, the proposed sensor is best suited for compensation of temporal degradation during measurement. The sensitivity of the proposed sensor further increase at elevated temperature (125 $$^{\circ }$$C) compares to room temperature (27 $$^{\circ }$$C). The proposed sensor has the improvement in sensitivity of 20.12% and 74.82% as compared to CM based sensor at room temperature and elevated temperature respectively. The transimpedance of the proposed sensor is linear and the linearity is unaffected by the voltage and temperature variations. The proposed sensor is 25% smaller and has faster response compared to CM based sensor. The proposed sensor is also unaffected by the supply voltage variations.

Ambika Prasad Shah, Nandakishor Yadav, Santosh Kumar Vishvakarma
Performance Analysis of OLED with Hole Block Layer and Impact of Multiple Hole Block Layer

The organic electronics has become one of the most essential field of research arena due to mechanical flexibility and low temperature fabrication. The organic devices and circuits are improving day by day in their performance and reliability. Organic light emitting diode (OLED) is one of the upcoming fields in this regards. Here, we are trying to improve performance of OLED by adding hole block layers to it. Further impact of number of hole block layers added to device is analyzed. It is observed that adding hole block layers improve device performance to a certain extent and if more and more hole block layers are added, device performance will start to degrade. The organic material based devices & circuits are identified as thrust area by the International Technology Road Map for Semiconductor (ITRS). OLED is considerably better candidate used in large area electronic displays.

Shubham Negi, Poornima Mittal, Brijesh Kumar
Improved Gate Modulation in Tunnel Field Effect Transistors with Non-rectangular Tapered Y-Gate Geometry

In this work, a novel approach has been investigated to overcome one of the major issue faced by Tunnel FETs i.e. its low drive current or On-current. The approach employed in the present work makes use of a non-rectangular tapered gate electrode geometry which helps in concentrating the electric field lines emanating from gate electrode towards the source/channel tunneling junction which results in enhancement in the band-to-band tunneling current.

Rakhi Narang, Mridula Gupta, Manoj Saxena
A 36 nW Power Management Unit for Solar Energy Harvesters Using 0.18 m CMOS

This work presents the design of ultra low power (ULP) management unit to be used in conjunction with tiny solar cells or energy harvesters providing very low power for wireless sensor node (WSN) applications for energy autonomy. The power management unit (PMU) is implemented using $$0.18\,\upmu $$m CMOS in subthreshold region of MOSFET for reduced power consumption with increased efficiency. It regulates the output voltage at 0.95 V and 0.968 V when the input voltages are 0.98 V and 1.33 V, respectively and achieves maximum 72.3% efficiency. The proposed PMU consumes 36 nW and 56 nW of power, at input voltages of 0.98 V and 1.33 V, respectively, thereby making it suitable for ultra low voltage, low power applications.

Purvi Patel, Biswajit Mishra, Dipankar Nagchoudhuri
A 10T Subthreshold SRAM Cell with Minimal Bitline Switching for Ultra-Low Power Applications

High noise margins and low power dissipation are the major attributes of the SRAM cells used in ultra-low power applications. This paper proposes a 10T Static Random-Access memory (SRAM) with data aware dynamic feedback control and disturb free read which enhances the noise margins in the sub-threshold region. Exploiting the dynamic threshold MOS transistors (DTMOS) technique reduces the read access time of the proposed memory cell. As this cell offers single ended write operation with the bitlines kept at logic HIGH which leads to large saving in dynamic power due to charging/discharging operation on bitlines. Therefore, proposed SRAM reduces the activity factor of discharging the bitlines for each write pattern. The simulation has been carried out in 65 nm technology node to show the comparison among the existing techniques and proposed cell. The proposed memory cell has write static noise margin (WSNM) of 1.7x and 1.48x compared to iso-area 6T and Schmitt Trigger based (ST2) SRAM cells respectively at supply voltage of 300 mV. Read operation is data controlled which improves the read margin. Dynamic threshold technique increases read current for faster read operation. Read SNM is 2x, 1.16x and 1.4x of iso-area 6T, differential data aware 9T and Schmitt trigger SRAM (ST2) respectively. These features enable the cell for ultralow power applications.

Swaati, Bishnu Prasad Das
Variability Investigation of Double Gate JunctionLess (DG-JL) Transistor for Circuit Design Perspective

Present work investigates the variability in the circuit performance of Double Gate JunctionLess (DG-JL) architecture due to variation in device parameters such as operating temperature (T), doping of the channel (Nch) and the variation in the doping profile (like Gaussian). We have also evaluated the impact of interface charges on the performance of DG-JL based CMOS inverter. Conventional CMOS inverter and amplifier circuit are used to demonstrate the performance of the DG-JL architecture. The parameters which are evaluated in this work are transfer characteristics, noise margin, propagation delay, inverter current and amplifier gain. Apart from this, influence of gate oxide permittivity on the transfer characteristics of CMOS inverter has been investigated. Presented results show that, the variation in the doping profile (i.e. from uniform to Gaussian) has lesser impact on the device performance. However, the change in peak doping concentration, operating temperature and influence of interface charges leads to significant change in inverter characteristics in terms of both noise margin and propagation delay.

Vandana Kumari, Manoj Saxena, Mridula Gupta

System Design

Frontmatter
A High Speed KECCAK Coprocessor for Partitioned NSP Architecture on FPGA Platform

The messages in the latest security protocols such as IPSec, TLS and SSL must be handled by high-speed crypto systems. Current computationally extensive cryptographic implementations on different platforms such as software, Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) without adequate optimization achieve lesser throughput than should be possible. In the paper we consider a cryptographic hashing algorithm KECCAK and its implementations. To achieve better throughput, the proposed implementations of KECCAK explores FPGA design spaces. In this paper three different architectures for KECCAK coprocessor are implemented in Artix-7 (XC7A100T, CSG324) FPGA platform. The Processing Element (PE) handles all communication interfaces, data paths and control signals hazards of Network Security Processor (NSP). A partitioned area in the system ensures that the processor data path is completely isolated from secret key memory. The memory to KECCAK core communication is done by Direct Memory Access Controller (DMA). The performances of the implemented KECCAK are better in terms of throughput and resource usage than the existing work reported in the literature.

Rourab Paul, Sandeep Kumar Shukla
New Energy Efficient Reconfigurable FIR Filter Architecture and Its VLSI Implementation

High performance and energy efficient reconfigurable FIR filter is the imperative requirement in the modern wireless communication applications. The transposed form block FIR filter based on distributed arithmetic proves to best suit the requirements of such application. Therefore, this paper presents a new energy efficient, multiplier-less transposed form block FIR filter architecture for reconfigurable applications using distributed arithmetic based approach. The proposed architecture provides improved area-delay product (ADP) and reconfigurability by employing efficient coefficient storage unit and multiplication using add-and-shift logic, respectively. The synthesis results at FPGA level show that the proposed architecture exhibits 13.15% and 13.33% reduced energy per sample for the filter length 64 with a block size of 4 and 8 samples respectively, over the existing design. Further, ASIC level results for filter length 64 and block size 8 shows 20.91% reduction in ADP and 32.86% reduction in the area over the existing architecture.

Naushad Ali, Bharat Garg
FPGA-Based Smart Camera System for Real-Time Automated Video Surveillance

Automated video surveillance is a rapidly evolving area and has been gaining importance in the research community in recent years due to its capabilities of performing more efficient and effective surveillance by employing smart cameras. In this article, we present the design and implementation of an FPGA-based smart camera system for automated video surveillance. The complete system is prototyped on Xilinx ML510 FPGA platform and meets the real-time requirements of video surveillance applications while aiming at FPGA resource reduction. The implemented smart camera system is capable of automatically performing real-time motion detection, real-time video history generation, real-time focused region extraction, real-time filtering of frames of interest, and real-time object tracking of identified target with automatic purposive camera movement. The system is designed to work in real-time for live color video streams of standard PAL (720 × 576) resolution, which is the most commonly used video resolution for current generation surveillance systems. The implemented smart camera system is also capable of processing HD resolution video streams in real-time.

Sanjay Singh, Sumeet Saurav, Ravi Saini, Atanendu S. Mandal, Santanu Chaudhury
Effectiveness of High Permittivity Spacer for Underlap Regions of Wavy-Junctionless FinFET at 22 nm Node and Scaling Short Channel Effects

In this work, an attempt has been made to investigate the performance of a new device, Wavy Junctionless FinFET at 22 nm node using low to high permittivity spacer for underlap regions. An alternative VTH extraction method has been demonstrated, which signifies the importance of cannel length at the nanoscale regime. The device layer Silicon film possesses uniform doping profile, where the current is controlled by channel doping and the mobility of charge carriers which account the bulk conduction instead of surface conduction. Due to the scalability of device dimensions, underlap regions are preferred to differentiate the control and the location of dopant atoms along the conduction region and hence this enhances the device performances. The simulation results enlighten the effectiveness of high permittivity of spacer region through performance evaluation. The simulated results exhibit an SS of 64 mV/decade, DIBL of 26 mV/V and ION/IOFF ratio of 107.

B. Vandana, J. K. Das, S. K. Mohapatra, B. K. Kaushik
Design and Implementation of Ternary Content Addressable Memory (TCAM) Based Hierarchical Motion Estimation for Video Processing

In this paper, block based Hierarchical Motion Estimation (ME) using Ternary Content Addressable Memory (TCAM) is proposed. Conventional works estimate motion using nearest neighbourhood search where the computation of search locations has high complexity. The novelty of the proposed work is to accelerate the estimation process using mixed parallel and pipeline processing with TCAM. This technique searches the pixel variations between current and different reference blocks of a frame simultaneously by checking complete match case as well as partial match case. If matching is found in the same location of block space then there is no motion and if matching is found in different location of block space other than the existing location then it is considered that motion has occurred. Further, if motion occurs in more than one location then the best match is found by Sum of Absolute difference (SAD) between the blocks of pixels of size $$8\times 8$$. Motion vectors are computed for complete as well as partial match of $$8\times 8$$ block within two $$16\times 16$$ blocks. A TCAM engine is designed to store the pixels of the reference frame. Afterwards a search operation is performed using a current block of size $$16\times 16$$ with two reference blocks of size $$16\times 16$$ or one $$32\times 16$$. The number of clock cycles consumed for this operation is 382. It is observed that the consumption of hardware resources is 33%. The complete architecture is designed in verilog and it is implemented in FPGA Virtex-7 and ASIC.

Puja Ghosh, P. Rangababu
A Custom Designed RISC-V ISA Compatible Processor for SoC

RISC-V is an open Instruction Set Architecture (ISA) released by Berkeley Architecture Group from the University of California, at Berkeley (UCB) in 2010. This paper presents the architecture, design and complete implementation of a 32-bit customisable processor system containing a mix of features as listed below. The 32-bit processor based on RISC-V ISA, is capable of handling atomic operations in addition to all integer operations supported by the ISA. The design has a priority-based nested interrupt controller, giving the user an added flexibility to program the priority levels of interrupts. In addition, there is a debug unit which provides internal visibility during program execution. An error detection and correction interface to memories, makes the design resilient to radiation induced bit-flips. The on-chip communication interface follows the standard Wishbone specification. The design has been implemented on Xilinx Virtex-7 XC7VX48T FPGA and achieves a peak frequency of 80 MHz, with the processor stand-alone operating at 190 MHz. On a 65 nm technology node, the design operates at a frequency of 170 MHz, while the processor stand-alone, a maximum frequency of 220 MHz. The design occupies a footprint of 1.027 mm$$^2$$ with 32-KB on-chip memory.

Kavya Sharat, Sumeet Bandishte, Kuruvilla Varghese, Amrutur Bharadwaj

Low Power Design and Test

Frontmatter
An Efficient Timing and Clock Tree Aware Placement Flow with Multibit Flip-Flops for Power Reduction

Multibit flip-flops (MBFFs) approach have been discussed with significant interest in the literature as the promising way to minimize the power consumption of the clock network in the modern System on Chip (SoC) designs. However, in real designs with complex architectures, MBFFs approach without the full awareness of placement and clock tree information may adversely affect the design attributes. This includes heavy congestion post clock tree synthesis (CTS), long wire-lengths leading to higher voltage drop and timing violations. This paper introduces a novel placement methodology, integrated with existing electronic design automation (EDA) flow and tools, for MBFF generation with prerequisite knowledge of clock tree architecture. In addition, an algorithm for minimizing the clock insertion delay (CID) of the design is proposed. The algorithm reduces the CID by identifying the clock tree nets and the clock tree sinks which violate the CID at the early CTS stage. The proposed methodology is validated on two different designs which are complex and target real applications. The proposed methodology leads to 50.46% and 37.7% reduction in flip-flop power consumption for design I and II, respectively. Furthermore, the core density has improved by 12.8% and 9.8% for design I and II, respectively. An average reduction of 9.2% in the CID validates the superiority of the proposed algorithm over existing algorithm.

Jasmine Kaur Gulati, Bhanu Prakash, Sumit Darak
Primitive Instantiation Based Fault Localization Circuitry for High Performance FPGA Designs

The ever increasing demand to push the envelope for achieving superlative metrics of VLSI circuit performance along with denser logic packing and miniaturization of device dimensions, has rendered FPGAs to be more vulnerable to reliability hazards. This has led to reducing of the reliability and lifetime of VLSI chips. In this paper, we have proposed certain circuit techniques which comes along with the original design, to detect the presence of faulty FPGA logic slices, without significant compromise in performance. Primitive instantiation and constrained placement based approach was adopted for the circuit realizations to facilitate tracing of the exact faulty location, so that the faulty zones may be conveniently bypassed for fault-free circuit operation.

Ayan Palchaudhuri, Anindya Sundar Dhar
On Generation of Delay Test with Capture Power Safety

Manufacturing test application without violation of circuit power budget is one of the primary concern for test engineers today. Excessive power demand often triggers false failures hence reduces the yield. Most of the automatic test pattern generation (ATPG) algorithms and test set modification methods have been proposed to minimize power requirement during the test. However, power reduction achieved is not enough as functional power budget of the circuit is usually much smaller than the high activity producing test patterns. This paper proposes an optimization problem formulation which targets test generation of transition delay faults without exceeding operative power limit. An optimization problem is constructed, and tests have been generated for slow-to-rise and slow-to-fall transition delay faults. The proposed method is capable of producing both Launch-On-Capture and Launch-On-Shift delay vectors. A pseudo SAT-based solver can be exercised to solve the formulated optimization problem. As the problem is optimized to maximize the number of faults detected under functional and power constraints of the circuit, this helps in generating the compact test set. Experiments are conducted on ISCAS89 benchmark circuits support the effectiveness of the proposed technique.

Rohini Gulve, Nihar Hage
A Configurable and Area Efficient Technique for Implementing Isolation Cells in Low Power SoC

In SoC design, isolation cells are used between different power domains to prevent the floating outputs/inputs of the power gated blocks from affecting the operations of the active circuits. At present, the low power SoCs use millions of isolation cells to implement different power gating modes and the isolation cells occupy considerable silicon area of the SoC. Also, the isolation values in low power designs are pre-determined (either fixed to ‘0’ or ‘1’ in design itself) and are non-configurable in real time operation. Hence, any incorrect isolation value may render the device useless in low power modes. In this paper, we propose a modified clamping circuit design to reduce the area and delay of the isolation cells. We also propose a method to configure the isolation values for certain qualifier signals and the subsequent entry process of the power gated modules into deep-sleep mode. The results show that the proposed technique can improve reliability of the power gating modes and reduce 30% to 50% of isolation cell area compared to that of the conventional isolation technique using logic gates.

Prokash Ghosh, Jyotirmoy Ghosh

RF Circuits

Frontmatter
A 10 MHz, 42 ppm/, 69 μW PVT Compensated Latch Based Oscillator in BCD9S Technology for PCM

In this paper, a PVT compensated, 10 MHz oscillator in 0.11 µm BCD9S (Bipolar CMOS DMOS) technology for embedded phase change memories (PCM) is reported. The proposed oscillator produces a frequency deviation of ±0.4% for typical corner, ±2% for slow corner and ±1.5% for fast corner around 10 MHz across −40 °C to 150 °C at a regulated supply of 1.8 V. It is a significant advancement in the existing state-of-the-art for frequency references.

Vivek Tyagi, M. S. Hashmi, Ganesh Raj, Vikas Rana
A 1.8 V Gain Enhanced Fully Differential Doubly-Recycled Cascode OTA with 100 dB Gain 200 MHz UGB in CMOS

A fully differential OTA based on modified Doubly Recycling current technique is presented here. The proposed technique uses a Gm boosted Cascode stage at the output, there by enhancing the DC gain of recycling cascode OTA with an improved phase margin. 102 dB of DC gain is achieved, which is almost 20 dB more than the existing architectures designed at 1.8 V supply. Enhancement of gain helps in reducing the input referred noise down to 10 uV/$$\mathrm{{\sqrt{Hz}}}$$. The designed OTA achieves UGB of 200 MHz at a capacitive load of 10 pF which makes it suitable for high speed applications. The OTA is designed in standard 45 nm CMOS Process. The 2 stage OTA uses MCNR approach to emulate first order Phase response before UGB, giving a Phase Margin of more than 69$$^{\circ }$$ for typical load of 10 pF. The input referred noise is 10 $$\upmu $$V/$$\mathrm{{\sqrt{Hz}}}$$ at 10 Hz and Slew Rate 105 V/$$\upmu $$S for load of 1 pF.

Antaryami Panigrahi, Abhipsa Parhi
A Low Power, Frequency-to-Digital Converter CMOS Based Temperature Sensor in 65 nm Process

A low power all CMOS based smart temperature sensor is introduced without using any bandgap reference or any current/voltage analog-to-digital converter. With the intention of low cost, power and area consumption, the proposed temperature sensor operates in sub-threshold region generating a temperature dependent frequency from the proportional to absolute temperature current. A digital output is obtained from the temperature dependent frequency by using a 12-bit asynchronous counter. A temperature insensitive ring oscillator is designed used a reference clock signal in counter. The temperature sensor is implemented using 65 nm CMOS standard process and its operation is validated through post-layout simulation results, at a power supply of (0.5–1)-V. The sensor has an uncalibrated accuracy of +2.4/–2.1 °C for (–55 to 125) °C and a resolution of 0.28 °C for the same range. The power and area consumed by the sensor is 1.55 µW and 0.024 mm2 respectively.

Mudasir Bashir, Sreehari Rao Patri, K. S. R. Krishna Prasad
Design & Development of High Speed LVDS Receiver with Cold-Spare Feature in SCL’s 0.18 µm CMOS Process

This paper presents design and implementation of LVDS Receiver chip in SCL’s CMOS 0.18 µm, 3.3 V process. It is compatible with Low Voltage Differential Signaling (LVDS) standard. The receiver is designed for data rate of 1Gbps. This chip consists of four channels of LVDS receiver. The size of the chip is 2130 µm × 1500 µm and is packaged in 16 pin CFP (ceramic flat pack) package. The chip architecture, design, measured results are presented here. The radiation test such as total ionizing dose (TID) upto 300 K rad is performed on chip and single event effects (SEE) test using heavy ions Nickel (Ni58) and Silver (Ag107) has also been carried out. The performance under radiation environment is also been given.

Munish Malik, Ajay Kumar, H. S. Jatana

Architecture and CAD

Frontmatter
Fast FPGA Placement Using Analytical Optimization

FPGA (Field Programmable Gate Arrays) placement consumes half of the runtime of the design flow. As the number of cells are increasing due to increase in design complexity and size, this problem is gaining importance. Typically placement of blocks in FPGA are based on simulated annealing algorithms. Since the FPGA designs are smaller compared with their ASIC counterparts, simulated annealing algorithms are feasible as the runtime to place them is less. However, as the design size and complexity is increasing, simulated annealing algorithms and genetic programming based algorithms tend to be slower. In this paper, our work is targetted towards improving the runtime of placement in FPGAs. We propose a novel algorithm which is based on nonlinear analytical methods. This method uses density penalty approach, wherein, the spreading of blocks across the die is controlled by the square of penalty for the uneven regions across the die. Our method is fast and, when compared with VPR, we improve the runtime by 750% while providing a reasonably good solution for the placement.

Sameer Pawanekar, Gaurav Trivedi
A Dependability Preserving Fluid-Level Synthesis for Reconfigurable Droplet-Based Microfluidic Biochips

Due to inherent reconfigurable capability, digital microfluidic biochips (DMFBs) have been a prime platform for critical medical diagnosis, real time bioassays, and lab-on-chip experiments. However, dependability is an urgent need to decide the correct outcome from a bioassay execution. To make a DMFB dependable in high frequency applications, a single electrode must not be frequently used as it may result in over-actuation problem. An over-actuated cell degrades over time and results a failure. Current fluid-level synthesis method only considers in minimizing the total completion time of the assay. Besides, recent technologies use abundant re-execution and perform costly online synthesis whenever such a fault is discovered. Two papers address the dependability issue and propose a placement solution. Here, we present a complete fluid-level synthesis to prepare binding, scheduling, placement, and routing solutions for a given bioassay. The concerned problem is proved to be NP-complete. A dynamic programming formulation is followed to obtain a solution in pseudo-polynomial time. Several benchmarks are used to evaluate the proposed method.

Arpan Chakraborty, Piyali Datta, Debasis Dhal, Rajat Kumar Pal
Splitting and Transport of a Droplet with No External Actuation Force for Lab on Chip Devices

In this work, we have presented a new droplet splitting and transport mechanism using surface wetting phenomenon for lab on chip devices. The proposed methodology can be well utilized to split and transport a droplet without application of any external force or voltage. A 3D multiphase lattice Boltzmann algorithm with partial wetting surface is developed and simulated using D3Q19 model. A superhydrophobic surface is realized experimentally using a selective painting approach. The experimental results validated the predicted result. The surface free energy characteristics are obtained and analyzed with respect to time for droplet transportation.

T. Pravinraj, Rajendra Patrikar
Analytical Partitioning: Improvement over FM

Traditionally, VLSI standard cell placement has been driven by hypergraph partitioning tools such as hMetis and MLPart, which employ FM based partitioning. According to the results seen in recent ISPD placement contests, none of the partition driven placers could produce a good solution. Hence, there is a room for improvement of hypergraph partitioning algorithms. In this paper, we present a novel hypergraph partitioning algorithm, which is based on nonlinear optimization. We solve nonlinear equations to partition the hypergraphs in ISPD98 benchmarks. Our results show an improvement over well-known FM heuristic. Our algorithm outperforms FM in 17 benchmarks out of 18, and an average improvement of 111.5% in the quality the of cuts.

Sameer Pawanekar, Gaurav Trivedi
A Lifting Instruction for Performing DWT in LEON3 Processor Based System-on-Chip

Discrete Wavelet Transform (DWT) calculations form an inherent part of many signal processing applications. Application specific instructions provide a means to increase performance and efficiency of System-on-Chip (SoC) requiring DWT operations. In this paper, lifting scheme based hardware for efficient DWT calculation, is implemented as an instruction to enhance the performance of an SoC. The hardware is integrated using the coprocessor interface of the SPARCv8 ISA based LEON3 processor. This method for attaching lifting hardware is found to be much more efficient than the prevalent system-bus based integration. The performance measure is provided in terms of CPI and MIPS along with FPGA and ASIC implementation results of the SoC.

Rajul Bansal, Mahendra Kumar Jatav, Abhijit Karmakar
Droplet Position Estimator for Open EWOD System Using Open Source Computer Vision

Digital microfluidics (DMF) emerged as a popular technology for lab on chip (LOC) application, which allows full and independent control over droplets on an array of electrodes. In this work, we have demonstrated a low-cost open electrowetting on dielectric (EWOD) based system, which is capable of tracking droplet position on the single substrate in real time. Printed circuit board (PCB) technology has been used for fabrication of two dimentional open EWOD device. Bio- compatible polydimethylsiloxane (PDMS) is used as a dielectric as well as a hydrophobic layer. The controlled droplet transport is successfully done on the fabricated device. The detection of droplet position is successfully demonstrated using open source computer vision image processing tool. This work illustrates the promise of open two- dimensional EWOD device for digital microfluidics application.

Vandana Jain, Vasavi Devarasetty, Rajendra Patrikar
Design and Implementation of Mixed Parallel and Dataflow Architecture for Intra-prediction Hardware in HEVC Decoder

The objective of the paper is to implement an area efficient hardware for intra prediction in high efficiency video coding (HEVC) decoder for DC, angular and planar modes of all block sizes. viz., $$64\times 64$$, $$32\times 32$$, $$16\times 16$$, $$8\times 8$$ and $$4\times 4$$. The proposed hardware is written in Verilog and implemented in field programmable gate array (FPGA) Virtex-7. The clock cycles consumed by the proposed design is the lowest as compared to the existing designs [7] as in the proposed architecture all the three modes ( DC, angular and planar modes) are executed in parallel. The reference pixels are processed and one $$4\,\times \,4$$ block is obtained at the output in one clock cycle as the architecture is designed to process 16 pixels (one 4 $$\times $$ 4 block) in parallel for all the three modes. Once the prediction for one mode of a block is completed the resources are released and made available to be used by next mode or next block. Thus the resource consumption is less as compared to existing designs where all the modes for each block is executed irrespective of encoder information which results in unnecessary resource usage.

Rituparna Choudhury, P. Rangababu

Design Verification

Frontmatter
A Formal Perspective on Effective Post-silicon Debug and Trace Signal Selection

In spite of state explosion problem in the present era of complex and large designs, formal methods have been utilized for pre-silicon verification with limited success. This paper critically analyzes some of the reported work on usage of formal principles for effective root-cause finding of bugs during post-silicon validation and debugging. The application of trace buffers assist in mitigating the problem of limited observability of internal states during debug at post-silicon stage. This paper proposes the usage of state restoration principle to increase the efficiency of the formal methods of post-silicon debugging. To solve the problem of trace signal selection, a methodology based on formal principles is presented to increase the effectiveness of trace signals.

Binod Kumar, Kanad Basu, Ankit Jindal, Brajesh Pandey, Masahiro Fujita
Translation Validation of Loop Invariant Code Optimizations Involving False Computations

Code motion based optimizations are used quite often in electronic design automation (EDA) tools to improve the quality of synthesis results. Ensuring the correctness of such transformation is necessary for reliability of EDA tools. A value propagation (VP) based equivalence checking method of finite state machine with datapaths (FSMD) was proposed in [1] to specifically verify code motion across loops. In this work, we identify some scenarios involving loop invariant code motion where the VP based equivalence checking method fails to establish the equivalence between two actually equivalent FSMDs. We propose an enhancement over the VP based equivalence checking method [1] to overcome this limitation. Experimental results demonstrate that our method can handle the scenario where the VP based equivalence checking method fails.

Ramanuj Chouksey, Chandan Karfa, Purandar Bhaduri
A Framework for Automated Feature Based Mixed-Signal Equivalence Checking

The presence of real valued variables that change continuously over dense real time makes it unrealistic to lift the definitions of equivalence used in the digital domain to the analog/mixed-signal domains. Thus the notion of equivalence between infinite state systems such as analog and mixed signal (AMS) circuits have been traditionally expressed in terms of its domain specific features or behavioral signatures. This paper formalizes the definition of feature based equivalence and presents a framework for monitoring feature based equivalence using a simulation based approach. The proposed methodology has been illustrated using various AMS circuit families.

Antara Ain, Sayandeep Sanyal, Pallab Dasgupta
xMAS Based Accurate Modeling and Progress Verification of NoCs

Network on Chip (NoC) plays a significant role in improving computation speed in Tiled Chip Multiprocessor (TCMP) by acting as an efficient interconnection network between the tiles. Designing a NoC satisfying all important functional properties with high efficiency is challenging. Some of the crucial properties to be fulfilled for proper functioning of NoC with efficiency are namely progress, mutual exclusion, starvation freedom, deadlock freedom, congestion freedom and livelock freedom. Exhaustive checking of such system properties in NoC can be done by formal verification method. In existing verification works, NoC are modeled in abstract level. Therefore, the properties verified does not guarantee that they work in real hardware. In our work, we have modeled NoC router using Executable Micro Architectural Specification (xMAS) primitives so that our design becomes near to register transfer level (RTL). In this model, we have verified progress property with help of NuSMV model checker. Experimental results show that our model is scalable for progress verification in Mesh and Ring topologies.

Surajit Das, Chandan Karfa, Santosh Biswas
Faulty TSVs Identification in 3D IC Using Pre-bond Testing

Through-silicon via (TSV) based three-dimensional integrated circuit (3D IC) is gaining remarkable attention in semiconductor industry. The design of 3D IC goes through a complex manufacturing process and testing of TSVs is a critical issue to the researchers. This paper presents an efficient solution for pre-bond TSV testing. The proposed method generates the sequence of test sessions for identifying defective TSVs in a TSV network in reduced test time. Simulation results show the effectiveness of proposed method in terms of test time reduction than the prior works.

Dilip Kumar Maity, Surajit Kumar Roy, Chandan Giri
Backmatter
Metadaten
Titel
VLSI Design and Test
herausgegeben von
Brajesh Kumar Kaushik
Sudeb Dasgupta
Virendra Singh
Copyright-Jahr
2017
Verlag
Springer Singapore
Electronic ISBN
978-981-10-7470-7
Print ISBN
978-981-10-7469-1
DOI
https://doi.org/10.1007/978-981-10-7470-7

Neuer Inhalt