Skip to main content

2006 | Buch

FPGA Implementations of Neural Networks

herausgegeben von: Amos R. Omondi, Jagath C. Rajapakse

Verlag: Springer US

insite
SUCHEN

Über dieses Buch

During the 1980s and early 1990s there was signi?cant work in the design and implementation of hardware neurocomputers. Nevertheless, most of these efforts may be judged to have been unsuccessful: at no time have have ha- ware neurocomputers been in wide use. This lack of success may be largely attributed to the fact that earlier work was almost entirely aimed at developing custom neurocomputers, based on ASIC technology, but for such niche - eas this technology was never suf?ciently developed or competitive enough to justify large-scale adoption. On the other hand, gate-arrays of the period m- tioned were never large enough nor fast enough for serious arti?cial-neur- network (ANN) applications. But technology has now improved: the capacity and performance of current FPGAs are such that they present a much more realistic alternative. Consequently neurocomputers based on FPGAs are now a much more practical proposition than they have been in the past. This book summarizes some work towards this goal and consists of 12 papers that were selected, after review, from a number of submissions. The book is nominally divided into three parts: Chapters 1 through 4 deal with foundational issues; Chapters 5 through 11 deal with a variety of implementations; and Chapter 12 looks at the lessons learned from a large-scale project and also reconsiders design issues in light of current and future technology.

Inhaltsverzeichnis

Frontmatter
Chapter 1. FPGA Neurocomputers
Abstract
This introductory chapter reviews the basics of artificial-neural-network theory, discusses various aspects of the hardware implementation of neural networks (in both ASIC and FPGA technologies, with a focus on special features of artificial neural networks), and concludes with a brief note on performance-evaluation. Special points are the exploitation of the parallelism inherent in neural networks and the appropriate implementation of arithmetic functions, especially the sigmoid function. With respect to the sigmoid function, the chapter includes a significant contribution.
Amos R. Omondi, Jagath C. Rajapakse, Mariusz Bajger
Chapter 2. On the Arithmetic Precision for Implementing Back-Propagation Networks on FPGA: A Case Study
Abstract
Artificial Neural Networks (ANNs) are inherently parallel architectures which represent a natural fit for custom implementation on FPGAs. One important implementation issue is to determine the numerical precision format that allows an optimum tradeoff between precision and implementation areas. Standard single or double precision floating-point representations minimize quantization errors while requiring significant hardware resources. Less precise fixed-point representation may require less hardware resources but add quantization errors that may prevent learning from taking place, especially in regression problems. This chapter examines this issue and reports on a recent experiment where we implemented a Multi-layer perceptron (MLP) on an FPGA using both fixed and floating point precision. Results show that the fixed-point MLP implementation was over 12x greater in speed, over 13x smaller in area, and achieves far greater processing density compared to the floating-point FPGA-based MLP.
Medhat Moussa, Shawki Areibi, Kristian Nichols
Chapter 3. FPNA: Concepts and Properties
Abstract
Neural networks are usually considered as naturally parallel computing models. But the number of operators and the complex connection graph of standard neural models can not be handled by digital hardware devices. Though programmable digital hardware now stand as a real opportunity for flexible hardware implementations of neural networks, many area and topology problems arise when standard neural models are implemented onto programmable circuits such as FPGAs, so that the fast FPGA technology improvements can not be fully exploited. The theoretical and practical framework first introduced in [21] reconciles simple hardware topologies with complex neural architectures, thanks to some configurable hardware principles applied to neural computation: Field Programmable Neural Arrays (FPNA) lead to powerful neural architectures that are easy to map onto FPGAs, by means of a simplified topology and an original data exchange scheme. This two-chapter study gathers the different results that have been published about the FPNA concept, as well as some unpublished ones. This first part focuses on definitions and theoretical aspects. Starting from a general two-level definition of FPNAs, all proposed computation schemes are together described and compared. Their correctness and partial equivalence is justified. The computational power of FPNA-based neural networks is characterized through the concept of underparameterized convolutions.
Bernard Girau
Chapter 4. FPNA: Applications and Implementations
Abstract
Neural networks are usually considered as naturally parallel computing models. But the number of operators and the complex connection graph of standard neural models can not be handled by digital hardware devices. The Field Programmable Neural Arrays framework introduced in Chapter 3 reconciles simple hardware topologies with complex neural architectures, thanks to some configurable hardware principles applied to neural computation. This two-chapter study gathers the different results that have been published about the FPNA concept, as well as some unpublished ones. This second part shows how FPNAs lead to powerful neural architectures that are easy to map onto digital hardware: applications and implementations are described, focusing on a class of synchronous FPNA-derived neural networks, for which on-chip learning is also available.
Bernard Girau
Chapter 5. Back-Propagation Algorithm Achieving 5 Gops on the Virtex-E
Abstract
Back propagation is a well known technique used in the implementation of artificial neural networks. The algorithm can be described essentially as a sequence of matrix vector multiplications and outer product operations interspersed with the application of a point wise non linear function. The algorithm is compute intensive and lends itself to a high degree of parallelism. These features motivate a systolic design of hardware to implement the Back Propagation algorithm. We present in this chapter a new systolic architecture for the complete back propagation algorithm. For a neural network with N input neurons, P hidden layer neurons and M output neurons, the proposed architecture with P processors, has a running time of (2N + 2M + P + max(M,P)) for each training set vector. This is the first such implementation of the back propagation algorithm which completely parallelizes the entire computation of learning phase. The array has been implemented on an Annapolis FPGA based coprocessor and it achieves very favorable performance with range of 5 GOPS. The proposed new design targets Virtex boards.
We also describe the process of automatically deriving these high performance architectures using systolic array design tool MMAlpha. This allows us to specify our system in a very high level language (Alpha) and perform design exploration to obtain architectures whose performance is comparable to that obtained using hand optimized VHDL code.
Kolin Paul, Sanjay Rajopadhye
Chapter 6. FPGA Implementation of Very Large Associative Memories
Application to Automatic Speech Recognition
Abstract
Associative networks have a number of properties, including a rapid, compute efficient best-match and intrinsic fault tolerance, that make them ideal for many applications. However, large networks can be slow to emulate because of their storage and bandwidth requirements. In this chapter we present a simple but effective model of association and then discuss a performance analysis we have done in implementing this model on a single high-end PC workstation, a PC cluster, and FPGA hardware.
Dan Hammerstrom, Changjian Gao, Shaojuan Zhu, Mike Butts
Chapter 7. FPGA Implementations of Neocognitrons
Abstract
In this chapter it is described the implementation of an artificial neural network in a reconfigurable parallel computer architecture using FPGA’s, named Reconfigurable Orthogonal Memory Multiprocessor (REOMP), which uses p 2 memory modules connected to p reconfigurable processors, in row access mode, and column access mode. It is described an alternative model of the neural network Neocognitron; the REOMP architecture, and the case study of alternative Neocognitron mapping; the performance analysis considering the computer systems varying the number of processors from 1 to 64; the applications; and the conclusions.
Alessandro Noriaki Ide, José Hiroki Saito
Chapter 8. Self Organizing Feature Map for Color Quantization on FPGA
Abstract
This chapter presents an efficient architecture of Kohonen Self-Organizing Feature Map (SOFM) based on a new Frequency Adaptive Learning (FAL) algorithm which efficiently replaces the neighborhood adaptation function of the conventional SOFM. For scalability, a broadcast architecture is adopted with homogenous synapses composed of shift register, counter, accumulator and a special SORTING UNIT. The SORTING UNIT speeds up the search for neurons with minimal attributes. Dead neurons are reinitialized at preset intervals to improve their adaptation. The proposed SOFM architecture is prototyped on Xilinx Virtex FPGA using the prototyping environment provided by XESS. A robust functional verification environment is developed for rapid prototype development. Rapid prototyping using FPGAs allows us to develop networks of different sizes and compare the performance. Experimental results show that it uses 12k slices and the maximum frequency of operation is 35.8MHz for a 64-neuron network. A 512 X 512 pixel color image can be quantized in about 1.003s at 35MHz clock rate without the use of subsampling. The Peak Signal to Noise Ratio (PSNR) of the quantized images is used as a measure of the quality of the algorithm and the hardware implementation.
Chip-Hong Chang, Menon Shibu, Rui Xiao
Chapter 9. Implementation of Self-Organizing Feature Maps in Reconfigurable Hardware
Abstract
In this chapter we discuss an implementation of self-organizing feature maps in reconfigurable hardware. Based on the universal rapid prototyping system RAPTOR2000 a hardware accelerator for self-organizing feature maps has been developed. Using state of the art Xilinx FPGAs, RAPTOR2000 is capable of emulating hardware implementations with a complexity of more than 15 million system gates. RAPTOR2000 is linked to its host — a standard personal computer or workstation — via the PCI bus. For the simulation of self-organizing feature maps a module has been designed for the RAPTOR2000 system, that embodies an FPGA of the Xilinx Virtex (-E) series and optionally up to 128 MBytes of SDRAM. A speed-up of up to 190 is achieved with five FPGA modules on the RAPTOR2000 system compared to a software implementation on a state of the art personal computer for typical applications of self-organizing feature maps.
Mario Porrmann, Ulf Witkowski, Ulrich Rückert
Chapter 10. FPGA Implementation of a Fully and Partially Connected MLP
Application to Automatic Speech Recognition
Abstract
In this work, we present several hardware implementations of a standard Multi-Layer Perceptron (MLP) and a modified version called eXtended Multi-Layer Perceptron (XMLP). This extended version is an MLP-like feed-forward network with two-dimensional layers and configurable connection pathways. The interlayer connectivity can be restricted according to well-defined patterns. This aspect produces a faster and smaller system with similar classification capabilities. The presented hardware implementations of this network model take full advantage of this optimization feature. Furthermore the software version of the XMLP allows configurable activation functions and batched backpropagation with different smoothing-momentum alternatives. The hardware implementations have been developed and tested on an FPGA prototyping board. The designs have been defined using two different abstraction levels: register transfer level (VHDL) and a higher algorithmic-like level (Handel-C). We compare the two description strategies. Furthermore we study different implementation versions with diverse degrees of parallelism. The test bed application addressed is speech recognition. The implementations described here could be used for low-cost portable systems. We include a short study of the implementation costs (silicon area), speed and required computational resources.
Antonio Canas, Eva M. Ortigosa, Eduardo Ros, Pilar M. Ortigosa
Chapter 11. FPGA Implementation of Non-Linear Predictors
Application in Video Compression
Abstract
The paper describes the implementation of a systolic array for a non-linear predictor for image and video compression. We use a multilayer perceptron with a hardware-friendly learning algorithm. Until now, mask ASICs (full and semicustom) offered the preferred method for obtaining large, fast, and complete neural networks for designers who implement neural networks. Now, we can implement very large interconnection layers by using large Xilinx and Altera devices with embedded memories and multipliers alongside the projection used in the systolic architecture. These physical and architectural features — together with the combination of FPGA reconfiguration properties and a design flow based on generic VHDL — create a reusable, flexible, and fast method of designing a complete ANN on FPGAs. Our predictors with training on the fly, are completely achievable on a single FPGA. This implementation works, both in recall and learning modes, with a throughput of 50 MHz in XC2V6000-BF957-6 of XILINX, reaching the necessary speed for real-time training in video applications and enabling more typical applications to be added to the image compression processing
Rafael Gadea-Girones, Agustn Ramrez-Agundis
Chapter 12. The REMAP Reconfigurable Architecture: A Retrospective
Abstract
The goal of the REMAP project was to gain new knowledge about the design and use of massively parallel computer architectures in embedded real-time systems. In order to support adaptive and learning behavior in such systems, the efficient execution of Artificial Neural Network (ANN) algorithms on regular processor arrays was in focus. The REMAP-β parallel computer built in the project was designed with ANN computations as the main target application area. This chapter gives an overview of the computational requirements found in ANN algorithms in general and motivates the use of regular processor arrays for the efficient execution of such algorithms. REMAP-β was implemented using the FPGA circuits that were available around 1990. The architecture, following the SIMD principle (Single Instruction stream, Multiple Data streams), is described, as well as the mapping of some important and representative ANN algorithms. Implemented in FPGA, the system served as an architecture laboratory. Variations of the architecture are discussed, as well as scalability of fully synchronous SIMD architectures. The design principles of a VLSI-implemented successor of REMAP-β are described, and the paper is concluded with a discussion of how the more powerful FPGA circuits of today could be used in a similar architecture.
Lars Bengtsson, Arne Linde, Tomas Nordstrom, Bertil Svensson, Mikael Taveniku
Metadaten
Titel
FPGA Implementations of Neural Networks
herausgegeben von
Amos R. Omondi
Jagath C. Rajapakse
Copyright-Jahr
2006
Verlag
Springer US
Electronic ISBN
978-0-387-28487-3
Print ISBN
978-0-387-28485-9
DOI
https://doi.org/10.1007/0-387-28487-7

Neuer Inhalt