Skip to main content

Über dieses Buch

This book describes the current state of the art in big-data analytics, from a technology and hardware architecture perspective. The presentation is designed to be accessible to a broad audience, with general knowledge of hardware design and some interest in big-data analytics. Coverage includes emerging technology and devices for data-analytics, circuit design for data-analytics, and architecture and algorithms to support data-analytics. Readers will benefit from the realistic context used by the authors, which demonstrates what works, what doesn’t work, and what are the fundamental problems, solutions, upcoming challenges and opportunities.

Provides a single-source reference to hardware architectures for big-data analytics;

Covers various levels of big-data analytics hardware design abstraction and flow, from device, to circuits and systems;

Demonstrates how non-volatile memory (NVM) based hardware platforms can be a viable solution to existing challenges in hardware architecture for big-data analytics.



State-of-the-Art Architectures and Automation for Data-Analytics


Chapter 1. Scaling the Java Virtual Machine on a Many-Core System

In order to leverage effectively the abundant parallelism provided by large many-core enterprise servers, Java applications, libraries, and the virtual machine need to be architected carefully to avoid single-thread bottlenecks. Among the solutions for challenges tackled in scaling a single Java Virtual Machine (JVM), we discuss the most rewarding ones. They include: converting shared data objects to per-thread independent objects, applying scalable memory allocators, utilizing appropriate concurrency frameworks, parallelizing garbage collection phases, and enhancing data affinity to CPUs through a NUMA aware garbage collector. In this chapter, we use the LArge Memory Business Data Analytics (LAMBDA) workload to illustrate the importance of various scaling bottlenecks and to demonstrate the performance gain from the discussed solutions.

Karthik Ganesan, Yao-Min Chen, Xiaochen Pan

Chapter 2. Accelerating Data Analytics Kernels with Heterogeneous Computing

Heterogeneous computing platforms combining general-purpose processing elements with different accelerators (such as GPU or FPGAs) are ideally suited for efficient processing of compute-intensive data analytics kernels. In this chapter, we focus on the acceleration of data analytics kernels on heterogenous computing systems with FPGAs. The introduction of FPGAs in the context of data analytics is negatively impacted by the difficulty in programming such systems given the increasing complexity of FPGA-based accelerators. This makes high-level synthesis (HLS) an attractive solution to improve designer productivity by abstracting the programming effort above register-transfer level (RTL). HLS offers various architectural design options with different trade-offs via pragmas (loop unrolling, loop pipelining, array partitioning). However, non-negligible HLS runtime renders manual or automated HLS-based exhaustive architectural exploration for implementation of the kernels practically infeasible. To address this challenge, we have developed Lin-Analyzer, a high-level accurate performance analysis tool that enables rapid design space exploration with various pragmas for FPGA-based accelerators without requiring RTL implementations. We show how Lin-Analyzer can enable easy but performance efficient implementation of computational kernels from a variety of data analytics applications onto FPGA-based heterogeneous systems.

Guanwen Zhong, Alok Prakash, Tulika Mitra

Chapter 3. Least-squares-solver Based Machine Learning Accelerator for Real-time Data Analytics in Smart Buildings

Real-time data analytics based on machine learning algorithms for smart building energy management system is challenging. This chapter presents a fast machine-learning accelerator for real-time data analytics in smart micro-grid of buildings. A compact yet fast incremental least-squares-solver based learning algorithm is developed on computational resource limited IoT hardware. The compact accelerator mapped on FPGA can perform real-time data analytics with consideration of occupant behavior and continuously update prediction model with newly collected data. Experimental results have shown that our proposed accelerator has a comparable forecasting accuracy with an average speed-up of 4. 56× and 89. 05×, when compared to general CPU and embedded CPU implementation for load forecasting.

Hantao Huang, Hao Yu

Chapter 4. Compute-in-Memory Architecture for Data-Intensive Kernels

Energy efficiency has emerged as a major barrier to system performance and scalability, especially when dealing with applications which require processing large datasets. These data-intensive kernels differentiate themselves from compute-intensive kernels in that increased processor performance through parallel execution and technology scaling are unlikely to sufficiently improve energy-efficiency. This chapter describes two embodiments of a novel and reconfigurable memory-based computing architecture which is designed to handle data-intensive kernels in a scalable and energy-efficient manner, suitable for next-generation systems.

Robert Karam, Somnath Paul, Swarup Bhunia

Chapter 5. New Solutions for Cross-Layer System-Level and High-Level Synthesis

The rise of the Internet of Things—billions of internet connected sensors constantly monitoring the physical environment has coincided with the rise of big data and advanced data analytics that can effectively gather, analyze, generate insights about the data, and perform decision making. Data analytics allows analysis and optimization of massive datasets: deep analysis has led to advancements in business operations optimization, natural language processing, computer vision applications such as object classification, etc. Furthermore, data-processing platforms such as Apache Hadoop (White, Hadoop: the definitive guide. O’Reilly Media, Sebastopol, 2009) have become primary datacenter applications, but the rise of massive data processing also has a major impact on the increasing demand for both datacenter computation and data processing in edge devices to improve scalability of massive sensing applications.

Wei Zuo, Swathi Gurumani, Kyle Rupnow, Deming Chen

Approaches and Applications for Data Analytics


Chapter 6. Side Channel Attacks and Their Low Overhead Countermeasures on Residue Number System Multipliers

Due to the natural parallelism and the speed enhancement, Residue Number System (RNS) has been introduced to perform the modular multiplications in public-key cryptography. In this work, we examine the secure performance of RNS under side channel attacks, expose the vulnerabilities, and propose countermeasures accordingly. The proposed methods improve the resistance against side channel attacks without great area overhead or loss of speed performance, and are compatible to other countermeasures on both the logic level and the algorithm level. We prototype the proposed design on FPGA, and presented the implementation results confirm the efficiency of the proposed countermeasures.

Gavin Xiaoxu Yao, Marc Stöttinger, Ray C. C. Cheung, Sorin A. Huss

Chapter 7. Ultra-Low-Power Biomedical Circuit Design and Optimization: Catching the Don’t Cares

To reduce healthcare cost while simultaneously delivering high-quality health services, developing new portable and/or implantable biomedical devices is of great importance for both health monitoring and clinical treatment. In this chapter, we describe a radically new framework for ultra-low-power biomedical circuit design and optimization. The proposed framework seamlessly integrates data processing algorithms and their customized circuit implementations for co-optimization. The efficacy of the proposed framework is demonstrated by a case study of brain–computer interface (BCI).

Xin Li, Ronald D. (Shawn) Blanton, Pulkit Grover, Donald E. Thomas

Chapter 8. Acceleration of MapReduce Framework on a Multicore Processor

MapReduce framework is widely used in massive data processing, such as financial prediction, online marketing, and so on. Multicore processor is a great platform to implement MapReduce because of its inherent parallelism and flexibility. This book chapter extracts features of MapReduce applications, and proposes a software–hardware co-design framework based on a multi-core processor to improve the performance of MapReduce applications. Experimental results show that the MapReduce framework with hardware accelerators speeds up by 40 times at maximum compared to the pure software solution, and the proposed Topo-MapReduce speeds up further by 29% at maximum compared to the original MapReduce.

Lijun Zhou, Zhiyi Yu

Chapter 9. Adaptive Dynamic Range Compression for Improving Envelope-Based Speech Perception: Implications for Cochlear Implants

The temporal envelope is the primary acoustic cue used in most cochlear implant (CI) devices for eliciting speech perception in implanted patients. Due to biological constraints, a compression scheme is required to adjust the wide dynamic range (DR) of input signals to a desirable level. Static envelope compression (SEC) is a well-known strategy used in CI speech processing, where a fixed compression ratio is adopted to narrow the envelope DR. More recently, a novel adaptive envelope compression (AEC) strategy has been proposed. In contrast to the SEC strategy, the AEC strategy more effectively enhances the modulation depth of the envelope waveforms to make the best use of the DR, in order to achieve higher intelligibility of envelope-based speech. In this chapter, we first introduce the theory of and implementation procedures for the AEC strategy. Then, we present four sets of experiments that were designed to evaluate the performance of the AEC strategy. In the first and second experiments, we investigated AEC performance under two types of challenging listening conditions: noisy and reverberant. In the third experiment, we explore the correlation between the adaptation rate using the AEC strategy and the intelligibility of envelope-compressed speech. In the fourth experiment, we investigated the compatibility of the AEC strategy with a noise reduction (NR) method, which is another important facet of a CI device. The AEC-processed sentences could provide higher intelligibility scores under challenging listening conditions than the SEC-processed sentences. Moreover, the adaptation rate was an important factor in the AEC strategy for producing envelope-compressed speech with optimal intelligibility. Finally, the AEC strategy could be integrated with NR methods to enhance speech intelligibility scores under noisy conditions further. The results from the four experiments imply that the AEC strategy has great potential to provide better speech perception performance than the SEC strategy, and can thus be suitably adopted in CI speech processors.

Ying-Hui Lai, Fei Chen, Yu Tsao

Emerging Technology, Circuits and Systems for Data-Analytics


Chapter 10. Neuromorphic Hardware Acceleration Enabled by Emerging Technologies

The explosion of big data applications imposes severe challenges of data processing speed and scalability on computing system. However, the performance of the von Neumann machine is greatly hindered by the increasing performance gap between CPU and memory, motivating the active research on new or alternative computing architectures. One important instance is the neuromorphic computing engine, which provides the capability of information processing within a compact and energy-efficient platform. Recently, many research efforts have been investigated in utilizing the latest discovered memristors array in neuromorphic systems due to the similarity of memristors to biological synapses. In this chapter, we proposed two neuromorphic system designs with feedback and feedforward methodologies, respectively. Favorable performance in terms of robustness and recognition accuracy are demonstrated by simulation results and corresponding analysis.

Zheng Li, Chenchen Liu, Hai Li, Yiran Chen

Chapter 11. Energy Efficient Spiking Neural Network Design with RRAM Devices

Inspired by the human brain’s function and efficiency, neuromorphic computing offers a promising solution for a wide set of cognitive tasks, ranging from brain machine interfaces to real-time classification. The spiking neural network (SNN), which encodes and processes information with bionic spikes, is an emerging neuromorphic model with great potential to drastically promote the energy efficiency of computing systems. However, an energy efficient hardware implementation and the difficulty of training the model significantly limit the application of the spiking neural network. In this chapter, we first introduce the background knowledge of SNN and metal-oxide resistive switching random-access memory (RRAM). Then, we compare different training algorithms of SNN for real-world applications, and demonstrate that the Neural Sampling method is much more effective than other methods. We also explore the performance and energy efficiency by building the SNN-based energy efficient system for real-time classification with RRAM devices. We implement different training algorithms of SNN, including Spiking Time Dependent Plasticity (STDP) and Neural Sampling method. Our RRAM-based SNN systems for these two training algorithms show good power efficiency and recognition performance on real-time classification tasks, e.g., the MNIST digit recognition. Finally, we discuss a possible direction to further improve the classification accuracy by boosting multiple SNNs.

Yu Wang, Tianqi Tang, Boxun Li, Lixue Xia, Huazhong Yang

Chapter 12. Efficient Neuromorphic Systems and Emerging Technologies: Prospects and Perspectives

Recent advances in machine learning, notably deep learning, have resulted in unprecedented success in a wide variety of recognition tasks including vision, speech, and natural language processing. However, implementation of such neural algorithms in conventional “von-Neumann” architectures involve orders of magnitude more area and power consumption than that involved in the biological brain. This is mainly attributed to the inherent mismatch between the computational units—neurons and synapses in such models and the underlying CMOS transistors. In addition, these algorithms, being highly memory-intensive, suffer from memory bandwidth limitations due to significant amount of data transfer between the memory and computing units. Recent experiments in spintronics have opened up the possibility of implementing such computing kernels by single device structures that can be arranged in crossbar architectures resulting in a compact and energy-efficient “in-memory computing” platform. In this chapter, we will review spintronic device structures consisting of single-domain/domain-wall motion based devices for mimicking neuronal and synaptic units. System-level simulations indicate ∼ 100× improvement in energy consumption for such spintronic implementations over a corresponding CMOS implementation across different computing workloads.

Abhronil Sengupta, Aayush Ankit, Kaushik Roy

Chapter 13. In-Memory Data Compression Using ReRAMs

Data compression is a key building block in the current age of information deluge. It is necessary for efficient storage management, effective utilization of communication bandwidth and eventually helps to refine the data to provide information and knowledge. Given the growth of sensors and connected devices, the role of compression in data management is growing in importance steadily. Following the earliest computing abstractions, data is transferred between storage and computing blocks. Any form of processing, including the compression, needs to be run in the computing segment, and returned back to the storage. This basic notion is challenged by the advent of several new technologies, which support logic operations and storage on the same device. Consequently, in-memory computing platforms are being studied by researchers and commercial entities for their applicability in different scenarios, such as data encryption and on-chip machine learning. This chapter explores the implementation of data compression algorithm using such an in-memory computing platform. We explain the building blocks of the in-memory computing architecture, the steps of a data compression algorithm and show step-by-step the mapping process.

Debjyoti Bhattacharjee, Anupam Chattopadhyay

Chapter 14. Big Data Management in Neural Implants: The Neuromorphic Approach

In this chapter, we study the brain as a source of ‘big data’ and show how this impedes the scalability of implantable brain machine interfaces for neuroprostheses. The tight power constraints of these systems prevent wireless data transmission of thousands of channels of neural activity. Hence extracting information from the raw data and transmitting just the compressed information is necessary for future implants. This chapter explores several ‘neuromorphic’ solutions to extracting relevant information—spike detection to extract action potentials from raw data, spike sorting to classify the shapes of the action potentials and finally intention decoding to classify spatio-temporal spike trains into categories. We show that using these schemes implies more processing in the implant but can provide compression factors from 10–105. Lastly, a neuromorphic mixed-signal circuit to do intention decoding and provide maximum compression while dissipating sub-μW power is shown as a possible solution for neural implants of the future.

Arindam Basu, Chen Yi, Yao Enyi

Chapter 15. Data Analytics in Quantum Paradigm: An Introduction

In this introductory material, we will discuss basics of quantum paradigm and how the developments in that area may provide useful pointers in the domain of data analytics. We will discuss about the power of quantum computation with respect to the classical one and try to present the implications of arrival of several quantum technologies in practice. The prime concerns in data analytics are fast computation, fast communication, and security of data. Among these issues, the main focus is naturally on the computation and then the rest of the issues follow. The objective of getting better efficiency can be attained by discrete algorithms with improved (lesser) time complexity and it is now proven that there are quantum algorithms that are indeed much faster than their classical counterparts. However, in all the domains of computation, such improvements may not be available and also fabricating a commercial quantum computer is still elusive. We will try to briefly describe an outline of quantum paradigm in this material with possible implications in several aspects in data analytics.

Arpita Maitra, Subhamoy Maitra, Asim K. Pal
Weitere Informationen

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.




Der Hype um Industrie 4.0 hat sich gelegt – nun geht es an die Umsetzung. Das Whitepaper von Protolabs zeigt Unternehmen und Führungskräften, wie sie die 4. Industrielle Revolution erfolgreich meistern. Es liegt an den Herstellern, die besten Möglichkeiten und effizientesten Prozesse bereitzustellen, die Unternehmen für die Herstellung von Produkten nutzen können. Lesen Sie mehr zu: Verbesserten Strukturen von Herstellern und Fabriken | Konvergenz zwischen Soft- und Hardwareautomatisierung | Auswirkungen auf die Neuaufstellung von Unternehmen | verkürzten Produkteinführungszeiten
Jetzt gratis downloaden!