Brought to you by:
Paper

A scalable neural chip with synaptic electronics using CMOS integrated memristors

, and

Published 2 September 2013 © 2013 IOP Publishing Ltd
, , Citation Jose M Cruz-Albrecht et al 2013 Nanotechnology 24 384011 DOI 10.1088/0957-4484/24/38/384011

0957-4484/24/38/384011

Abstract

The design and simulation of a scalable neural chip with synaptic electronics using nanoscale memristors fully integrated with complementary metal–oxide–semiconductor (CMOS) is presented. The circuit consists of integrate-and-fire neurons and synapses with spike-timing dependent plasticity (STDP). The synaptic conductance values can be stored in memristors with eight levels, and the topology of connections between neurons is reconfigurable. The circuit has been designed using a 90 nm CMOS process with via connections to on-chip post-processed memristor arrays. The design has about 16 million CMOS transistors and 73 728 integrated memristors. We provide circuit level simulations of the entire chip performing neuronal and synaptic computations that result in biologically realistic functional behavior.

Export citation and abstract BibTeX RIS

1. Introduction

Neurons in the cerebral cortex maintain thousands of input and output connections with other neurons, forming a dense network of connectivity. For example, the human cerebral cortex has approximately 1010 neurons and 1014 synaptic connections, with a synaptic density of around 1010 synapses cm−2 [1]. With the advent of nanotechnology, it is now possible to realize nanodevices that can be fabricated at similar synaptic densities to the cerebral cortex. This advancement combined with advances in CMOS process technologies has resulted in a growing interest in the research community in the design of large scale neuromorphic systems with synaptic electronics that mimic biology [27]. The goal in designing such large scale systems is to develop intelligent machines that are scalable to match biological systems in performance efficiency [2]. This poses some major challenges in designing such hardware including scalability, connectivity and synaptic density. Scalability implies that the circuits are expandable to support the emulation of mammalian brains in terms of synaptic and neuronal elements. Connectivity corresponds to the ability of the circuit design to accommodate a large number of synaptic connections between neurons. Synaptic density refers to the number of synapses that can be fabricated in a given circuit area.

The neuromorphic paradigm is well suited for nanoscale computation because of its massive parallelism and fault tolerance. Recent advances in the areas of nanodevices [8], nanocircuits [9, 10], nano-crossbar arrays [11, 12] and nanoimprint lithography [13, 14] look promising. Of particular interest is a nanowire crossbar junction with resistive material [15] that functions as a synapse. These junctions have the potential for very high density and low fabrication costs [16, 17]. There have been several recent attempts to develop neuromorphic systems that leverage this idea. There have been recent neuromorphic systems [18, 19] that leverage this idea based on the crossbar design with memristive synapses that perform synaptic computation based on STDP [20, 21]. The advantage of these ideas is that the synaptic computation and storage occurs at the synaptic junction much like in biology. However, the crossbar design has issues with scalability and connectivity. Connection of N neurons with M neurons on each side of the crossbar requires N × M crosspoints (see [18] for example). This is impractical for three reasons. The number of neurons does not scale very well with the number of synapses. For example, a million synapse crossbar can only support 2000 neurons. Furthermore, fewer than 25% of the crosspoints are in use at any given time and the rest are idle. The relatively high defect density [22] makes it unclear as to the hardware performance at very large scales. These issues have prevented designers from realizing truly scalable neuromorphic circuits based on nanodevices thus far.

We began to address these challenges by integrating CMOS circuits with nanodevices based on memristors to enable large scale neuromorphic systems, where we exploit high density memristor arrays for analog synaptic conductance storage but perform synaptic computation using CMOS technology. The overall concept is illustrated in figure 1. The system is composed of three parts: the programmable front-end, the processing core and the synaptic conductance storage. The CMOS portion of the electronic circuit that performs neuronal and synaptic computations included STDP learning is referred to as the processing core. The synaptic storage is realized based on an incremental memristor design that operates like a synapse. The memristors are of an incremental type, where the conductance value of the memristor can be either increased or decreased by applying either positive or negative voltages between the two terminals of the memristor. The memristors are composed of a bottom electrode made of tungsten, a center portion of tungsten oxide and a top electrode that is made of palladium (see figure 7). Each memristor increments its memristance when the device is biased positively by a voltage pulse and vice versa. The magnitude of increment/decrement depends upon the voltage bias, the pulse width and the magnitude of the voltage pulse [24]. We have leveraged our knowledge of integrating memristors with CMOS electronics using a 180 nm CMOS process [25] to develop incremental memristor arrays with 3.5 bits (or 10 levels) of synaptic storage per memristor. This multibit memory integrated with a 90 nm CMOS processing core provides the required circuit elements to perform all the neuronal and synaptic computations to support the operation of very large scale neural architectures in electronics.

Figure 1.

Figure 1. The neuromorphic hardware components to support large scale neural architectures are shown here. The hardware portion consists of a front-end with a compiler and digital memory, an analog core that houses neurons and synapses with STDP, and a nanoscale analog memory based on CMOS integrated memristors.

Standard image High-resolution image

The system can be configured to support various neural architectures with a programmable front-end consisting of a neuromorphic compiler [26]. The compiler configures a set of digital switches that is part of a digital fabric [7, 26] for routing action potentials or spikes generated by neurons to other neurons based on the connections prescribed in the neural architecture. To meet the connectivity challenge of enabling more than 5000 connections of a single neuron to its peers [1], a novel synaptic time-multiplexing (STM) approach was designed [7, 26]. The key idea in STM is to exploit the difference in operating speed between electronics and mammalian brains and trade off space for speed of processing. To enable this, the physical connections between neurons are time multiplexed.

The synapses of the mammalian brain operate at lower speeds than individual electronic components, but have a synaptic density of about 1010 synapses cm−2, which is much higher than the density of transistors in a typical chip. However, the effective density of synapses in a chip can be increased by reusing a CMOS circuit to implement the operations of multiple synapses. We have adopted this strategy in designing our chip.

In mammalian brains the synapses process spike signals that have interspike intervals that are typically between 10 and 100 ms. However, a CMOS circuit can perform an equivalent synaptic operation and weight storage at a much higher speed. In our circuit, using a 90 nm CMOS, for example a key synaptic operation, including STDP weight update calculation and storage of weight in memory, could be performed in of the order of 10 μs. That is between 1000 and 10 000 times shorter than the duration of a typical biological interspike interval. This enables us to reuse (time multiplex) a single CMOS circuit from 100 to 1000 synapses [26]. In the current chip we use multiplexing ratios of up to 128. It should be noted that multiplexing ratios higher than 1000 can be achieved by using more advanced CMOS or memristor technologies.

In this paper we describe for the first time a neuromorphic system that combines all these elements, reconfigurable front-end, analog processing core and memristor-based synaptic storage, into a single system. We provide transistor level simulations of the entire chip with synaptic electronics performing neuronal and synaptic computations that result in a biologically realistic functional behavior. It should be noted that while one of the goals of the SyNAPSE project [2] is to achieve very high synaptic density and connectivity per neuron, in the current phase of the work we have a less aggressive approach so as to first ensure the feasibility of the chip given the complexity of interacting CMOS elements. This capability, however, provides the pathway for the design of very large scale neuromorphic systems in the future that can solve the scalability, connectivity and synaptic density challenges.

2. Chip architecture

A symbolic diagram of the neural chip and its architecture is shown in figure 2. The chip contains an array of processing nodes, as shown in figure 2(a). The actual chip has 576 nodes arranged in a regular 24 × 24 array. In the periphery of the node array there are CMOS I/O (input/output) circuits to interface data between the processing nodes and the chip pads.

Figure 2.

Figure 2. (a) Top level diagram of the chip, (b) symbolic diagram of a node with processing core, digital memory, memristor memory and control circuitry.

Standard image High-resolution image

A symbolic diagram of a node is shown in figure 2(b). It is composed of (i) a processing core with integrate-and-fire neurons, synapses and STDP circuits, (ii) analog memory based on memristors and (iii) routing channels that are used to enable communication between processing nodes. The chip architecture is modular since the processing node design allows for direct abutment of neighboring nodes to form a neural network. Due to the flexibility offered by programming of routing channels, it can also support communication between nodes that are both near and far away.

The analog memory of each node is composed of an array of memristors. In the version of the chip described here, there are 128 memristors in each node arranged in an 8 × 16 array. These memristors are used to store the synaptic conductance corresponding to the 128 synapses between a neuron in a single processing node and neurons in other nodes in the chip. There are multiplexer (MUX), analog-to-digital (ADC) and digital-to-analog (DAC) CMOS circuits that interface the memristor array with the processing core. The complete chip has a total of 73 728 nanoscale memristors and 16 million CMOS transistors.

The components of the processing core in each node are shown in figure 3. It is composed of a neuron, a synapse and a memory to store synaptic conductance. There is one neuron per node. The neuron is an integrate-and-fire type neuron. It integrates its input in an internal accumulator. When the integrated value reaches a threshold it resets the accumulator back to zero and produces an output spike.

Figure 3.

Figure 3. Detail of the components of a processing core of a node showing the interaction between an integrate-and-fire neuron, synapses and STDP circuits. For details of the circuit implementation of these key blocks, the reader is referred to [23]. The core has two options for storage of synaptic conductances: memristor arrays and also an auxiliary SRAM. We highlight the memristor array option in this paper.

Standard image High-resolution image

There is also one physical STDP and one physical synapse per node. These circuits are time multiplexed. This reduces the number of CMOS circuits that are required and also reduces the number of interconnections needed to transmit information between nodes [26, 27]. In our circuit each physical synapse and STDP circuit is multiplexed to implement N virtual synapses and STDP circuits.

The timing diagram used for synaptic time multiplexing is shown in figure 4. This synaptic time multiplexing (or STM) is performed by dividing the time consumed in a given cycle into time slots [26] of 100 μs duration each for a total cycle time of up to N × 100 μs duration (see figure 4).

Figure 4.

Figure 4. Timing diagram for synaptic time multiplexing.

Standard image High-resolution image

During each 100 μs time slot the physical synapse is assigned to perform the function of one given virtual synapse. In the chip we designed, the maximum number of slots per cycle is 128. During a 12.8 ms cycle (which corresponds to 128 time slots) the physical synapse can implement 128 different virtual synapses. Time multiplexing requires the storage of one synaptic conductance per virtual synapse. This is accomplished by an array of 128 memristors. In each time slot one memristor is read, which corresponds to the synaptic conductance of a synapse. In addition, in each slot the stored value in each memristor is updated according to a value provided by the STDP circuit and the update is used to increment or decrement the currently stored synaptic conductance value in the memristor. The memristors are accessed in a fixed order. During every STM cycle (see figure 4) a memristor of any given node is accessed once for reading and, if needed, once for writing an update.

3. Circuit details

The memristor array of each node interfaces to CMOS circuitry to select a memristor for read or write. A symbolic diagram of the memristor array is shown in figure 5(a). It is composed of 128 memristors with nanowires arranged in 16 rows and 8 columns. There are 24 vias in each node to interface the nanowires to CMOS circuitry, indicated as yellow circles in the symbolic diagram of figure 5(a). In each node the chip has one CMOS column circuit and a CMOS row circuit. These circuits are used to select at any one time one memristor of the array to perform either a read or a write operation.

Figure 5.

Figure 5. (a) Diagram of the memristor array inside one node with CMOS access circuitry, (b) detail of the CMOS circuitry connected to the rows of the memristor array, (c) detail of the CMOS circuitry connected to the columns of the memristor array, (d) example of eight typical IV characteristics of the memristor, (e) typical values of memristor currents when biased at 0.4 V and correspondence to synapse weight codes.

Standard image High-resolution image

A simplified diagram of the CMOS column circuit is shown in figure 5(b). It is composed of a buffer, an ADC and a de-multiplexer (DeMUX). The DeMUX is used to connect the terminal named Vsel_row to one of the 16 terminals connected to memristor nanowires. The other 15, unselected, nanowires are connected to a bias voltage. The bias voltage is used to minimize leakage paths [25]. During a writing operation the voltage Vsel_row is produced by a CMOS pulse generator (not shown in the figure). A typical value for the pulse amplitude is 1.4 V and a typical pulse lasts for 14 μs. However, the chip can be also operated with other voltages, up to 3.3 V. We use thick-oxide transistors, to tolerate higher voltages, for the circuits involved in memristor writing. To read a memristor value, an amplifier and an ADC are used, as shown in figure 5(b). The amplifier is used to set an accurate reading voltage in Vsel_row. The amplifier has an extra terminal that provides a current equal to that flowing to the memristor. This current is digitized by an ADC to produce a 3 bit (eight-level) code. Prior simulations and analysis of neural circuits with STDP show that a moderate number of bits of resolution per synapse is usually enough to emulate most behaviors [31]. In this paper, we have demonstrated that it is possible to learn tuning curves using a 3 bit synapse. This is one of the most commonly found neural responses in the cortex. In fact, the most frequently reported change during perceptual learning where there is sparsening of neural activity is often seen as sharpening of a cell's tuning curve. That is, if a cell responds to several weak stimuli weakly and to one stimulus strongly, after learning, it might respond only to the best stimulus [33]. We have shown in [34] that this basic ability to learn tuning curves can be exploited, for example, to design a self-organizing spiking neural model to learn spatio-motor transformations in simulated robots. We have also subsequently verified in our laboratory that is possible to learn the same transformations even with 3 bit synapses. We believe that this feature can be extended to learn arbitrary multimodal maps as well. We have also shown similar results for another model that learns functional maps of the visual cortex such as orientation preference maps and ocular dominance maps based on STDP [35]. Here again the neural cells tend to develop tuning curves.

For the functional behaviors analyzed in this paper our simulations show that three bits of resolution (eight levels) is sufficient to have correct behavior. The present design has three bits of resolution for the memristors and CMOS circuitry. However, the memristor design can be modified to support ten or more levels, without increasing the number of memristors needed. The CMOS circuitry of the chip, with very minor modifications, could also be extended to support a higher number of levels.

A simplified circuit diagram for the CMOS row circuit is provided in figure 5(c). It is composed of a DeMUX that is used to connect the terminal named Vsel_col to one of eight terminals connected to the memristor nanowires. The other seven unselected nanowires are connected to other bias voltages. These bias voltages are used to minimize leakage paths [25].

For the incremental nanoscale memristors, we used the design based on the University of Michigan (UM) implementation. Thus, for transistor level simulations of our full scale chip, we use an incremental memristor model developed at UM corresponding to this design, as described in [30]. The model is composed of three nodes: one for each terminal of the memristor and an internal node that stores the state of the memristor. The model contains one capacitor, two non-linear voltage-controlled sources and a resistor. The reader is referred to [30] for further details of this memristor model.

The current levels during a memristor read operation at 0.4 V are shown in figure 5(d) while the simulated IV characteristics of the memristor are shown in figure 5(e). The plot is based on a simulation using CADENCE-AMS. The layout of the chip with the 576 processing nodes is shown in figure 6(a). The detailed layout of a node is shown in figure 6(b). The size of each node including analog and digital circuits is 250 μm × 250 μm. The size of the chip, with 576 nodes and chip input/output circuitry, is 6.5 mm × 6.5 mm. The detail of the layout of a memristor array (with 128 memristors) of one node is shown in figure 6(c). A symbolic diagram of the detail of the lower left corner of a memristor array is shown in figure 6(d). This figure shows the layout of two CMOS wire segments, near the periphery of the memristor array, that contain the signals Vrow_1 and Vcol_1 generated by the CMOS row and column circuitry (see figures 5(b) and (c)). These wire segments are implemented in metal 8, which is a thick metal near the top of our 90 nm CMOS chip. Vias are post-processed on top of these metal segments for connection to nanowires. Nanowires and memristors are then post-processed on top of the CMOS. In the chip the vertical nanowires are connected to the bottom terminals of memristors, and the horizontal nanowires are connected to the top electrodes of memristors.

Figure 6.

Figure 6. Layout: (a) chip, (b) detail of one node, (c) detail of an MA (memristor array) inside one node, and (d) symbolic diagram of the lower left corner of a memristor array connected to the CMOS wiring.

Standard image High-resolution image

A diagram of a memristor is shown in figure 7. Figure 7(a) shows a simplified top view of the memristor electrodes. The top electrode is made of palladium (or Pd) and the bottom electrode of tungsten. The memristor core is located at the intersection of the two electrodes. Figure 7(b) shows a cross-section of a memristor core that is composed of a layer of tungsten oxide. Further details of this memristor structure, which we are using for this chip, can be found in [24]. This memristor structure is compatible with CMOS. One previous example of a memristor structure compatible with CMOS technology is given in [32].

Figure 7.

Figure 7. (a) Top view of a memristor showing the top and bottom electrodes, (b) Cross-section of the memristor.

Standard image High-resolution image

Figure 8 shows an SEM photograph of an array of 16 × 8 memristors fabricated at HRL. At the edge of the array are vias from the electrodes to metal 8 of the CMOS circuit. Details of integration of memristors with CMOS can be found in [25]. It should be noted that the memristors in our array have a pitch of 10 μm, which is far less aggressive than what is needed to achieve synaptic densities, to minimize risk in integrating various complex elements into one chip. However, we have shown in [25] that it is possible to achieve much higher synaptic densities with a pitch of around 100 nm. We plan to use these lessons to scale up our chip in our future generations.

Figure 8.

Figure 8. SEM photograph of a fabricated 16 × 8 memristor array.

Standard image High-resolution image

The neuron, synapse and STDP CMOS circuits are described in [23] and will not be presented here for brevity.

The spike signals produced in a node can be routed to the synapse circuit of a different node. This is accomplished by CMOS routing circuitry. The routing circuitry is shown in figure 9. Figure 9(a) shows the detail of the routing fabric associated with one node. The routing circuit is composed of CMOS wires, buffer-based switches (both uni-directional and bi-directional) and memory. The state of each of these switches (either ON or OFF) is stored in a digital CMOS memory. All the nodes have the same hardware, but the switch positions can be programmed independently in each node, for each time slot of an STM cycle. A uni-directional switch is implemented by a buffer, as shown in figure 9(b). The buffer can be turned on or off according to a control line. The detail of a bi-directional switch is shown in figure 9(c). This switch is composed of two buffers. Only one of the two buffers may be set ON at a given time. The digital memory inside the node contains information about the control (ON or OFF) of all buffers of the node switches. All the switch positions are identical in each STM cycle [26]. The memory stores the routing configuration for all the time slots of an STM cycle,as shown in figure 9(d). In the chip the memory has 34 columns and N rows. The data of a column of the memory are used to generate a control signal for a buffer for all N time slots. The digital memory is initialized with a user-defined network topology based on a neuromorphic compiler that was designed to handle arbitrary network topologies [26].

Figure 9.

Figure 9. (a) Routing fabric of one node. (b) Symbol and terminals of a switch based on one buffer with on/off control. (c) Symbol and circuitry of a bi-directional switch. (d) Detail of memory to store the connectivity.

Standard image High-resolution image

4. Simulations

The neural chip can be programmed to implement different network architectures. A simple example of a network used to illustrate the operation of the chip is shown in figure 10. It is composed of a neuron that has 16 synapses. The weight of each synapse is internally controlled by a CMOS based STDP circuit. The network of figure 10 can distinguish whether several of their inputs are correlated to each other [28, 29].

Figure 10.

Figure 10. (a) Network used for simulations. (b) In the network the 16 rows correspond to the time slots while the 34 columns correspond to the switches that are set in the chip fabric for each neuron type. Black represents an OFF state while white represents an ON state.

Standard image High-resolution image

The simulation of the chip that implements the network of figure 10 begins by initializing the digital memory to route spikes between neurons by setting the switch states in the node, as shown in figure 10(b). The top plots of figure 11(a) show the inputs provided to the chip in the form of spike trains. A set of eight different input spike trains, not correlated to each other, is applied to synapses 9–16. One additional spike train signal (shown as In1−8) is used as a common input to synapses 1–8. In this simulation synapses 1–8 receive identical inputs, perfectly correlated to each other. This network can be used to determine which inputs are correlated or uncorrelated to each other.

Figure 11.

Figure 11. Simulation waveforms: (a) synapse inputs and neuron output, (b) synaptic conductance values of 16 synapses.

Standard image High-resolution image

The bottom plot of figure 11(a) shows the output produced by the neuron during the simulation of the chip. The inputs and the neuron output are used by an STDP circuit (see figure 3) to generate the updates to the synaptic conductance values. In the simulation there are 16 weights that are stored in 16 memristors. The time evolution of the 16 weights, denoted as w1,1 to w1,16, is shown in figure 11(b). They are stored in memristors M1,1 to M1,16 within one node of the chip. During the chip operation the memristors of a node are accessed cyclically. Each access operation consists of (a) memristor read, (b) calculation of weight increment or decrement by the CMOS STDP circuit, and (c) write of a change in the memristor. The writing is performed only if there is a change (i.e., different from zero).

During a 100 μs time slot one memristor of a node is accessed once. During a 1.6 ms STM cycle, 16 memristors of the node are accessed once. The plots of figure 11(b) show the simulation of the weights for 0.3 s of operation of the chip. In the simulation there are on average 187 access operations performed on each of the 16 memristors.

The vertical axes of the plots of figure 11(b) represent the code for the synaptic conductance value. This code is produced by the ADC of figure 5(b) that ranges from 0 to 7 in steps of 1. It can be observed, according to the chip simulation, that after 0.3 s the weights w1,1 to w1,8, which are associated with synapses receiving correlated inputs, all tend to a high value. The weights w1,9 to w1,16, associated with synapses receiving uncorrelated inputs, all tend to a low value. This is the desired behavior, which matches the results observed using idealized neural models [28, 29]. Note that the even though the synapses 1–8 receive identical input, the internal time multiplexed STDP circuit senses this common input at different time slots for different synapses. This has the effect of introducing very small timing differences, of the order of a few 100 μs, in the sensed timing of the input spikes from one synapse to another. This sensed timing is used for the STDP update calculation of different synapses. These very small differences in the timing (typically less than 0.1 ms, which is just 0.5% of the typical 20 ms interspike interval) can very occasionally result in STDP updates that are slightly different from synapse to synapse. Therefore the weight waveforms of synapses 1–8 during the transient are sometimes not exactly identical. These very small differences in some of the transient weight waveforms of synapses 1–8 are expected and do not prevent the circuit from exhibiting proper behavior.

The details of a single memristor read operation during this simulation are shown in figure 12. The key voltages provided by the CMOS circuit are shown in figure 12(a). The waveform labeled as P represents the voltage applied to the positive terminal of the memristor and is 0.4 V in this simulation. It can also be programmed to be a different value. The line labeled as N represents the voltage applied to the negative terminal of the memristor. It is zero during a read operation. The dotted black line represents a control signal to enable the operation of the ADC circuit to digitize the current of the memristor to one of eight possible weight codes. The read operation lasts 4 μs. The current through the memristor during a read operation is shown in figure 12(b). During read operation the currents range from 2 to 16 μA.

Figure 12.

Figure 12. Simulation of memristor read in the chip: (a) voltage waveforms, (b) current waveforms.

Standard image High-resolution image

The details of a typical write operation during the simulation are shown in figure 13. In the chip the write is used to increment or decrement the value of the memristor. The chip CMOS STDP calculates the required increase or decrease to the synaptic weight. Then it applies one pulse or a set of pulses in proportion to the magnitude of the change in synaptic conductance to one of the two terminals of the memristor.

Figure 13.

Figure 13. Simulation of a memristor write operation in the chip: (a) voltage waveforms, (b) current waveforms.

Standard image High-resolution image

For an increment change in synaptic conductance, pulses are applied to the positive terminal of the memristor. The writing of each increment is shown in figure 13. The key voltages provided by the CMOS circuit are shown in figure 13(a). The waveform labeled as P represents the voltage applied to the positive terminal of the memristor. In this simulation the write voltage used is 1.4 V. It can also be programmed to be a different value. The line labeled as N represents the voltage applied to the negative terminal of the memristor. It is approximately zero during a read operation. The dotted black line represents a control signal that sets the duration of the write pulse. To write an increment the chip applies from 1 to 4 write pulses.

The number of pulses is determined by an on-chip control circuit that reads the memristor current just after each write pulse. The set of pulses is stopped when the target increment value is achieved. In the example of the figure the target increment is achieved after two write pulses. The current through the memristor during the writing sequence is shown in figure 13(b). During each write pulse currents of about 100 μs can flow through the memristor. The read currents measured in the 4 μs intervals after each write pulse are in the desired range of 2–16 μA. For a decrement operation, a similar process occurs when the pulses are applied to the negative terminal of the memristor.

We have also used the chip to implement other networks. The simulation of a more complex network with ten neurons is shown in figure 14. The topology of this network is shown in figure 14(a).

Figure 14.

Figure 14. (a) Network with ten neurons. (b) Snap shots of switch states stored in digital memory for the network with 16 rows corresponding to the time slots while the 34 columns correspond to the switches that are set in the chip fabric for each neuron type. Black represents an OFF state while white represents an ON state. (c) Simulation of output neuron C. (d) The temporal evolution of the synaptic conductance values of 16 synapses shows convergence to the correct states.

Standard image High-resolution image

This topology is very similar to the one in figure 10 but with an additional layer of nine neurons located between the inputs and the output neuron. The output neuron, number of synapses and input signals are similar to those in figure 10. The functional behavior of this network is also similar to the one of figure 10 and can distinguish correlated inputs from uncorrelated inputs.

The digital memory of the chip is initialized as shown in figure 14(b). The switch states in each node are set as shown here and are used to route the spikes between the various neurons in the network during each STM cycle. The process is repeated from the beginning after the completion of each STM cycle. The output of the rightmost neuron (neuron C) during a simulation is shown in figure 14(c). The time evolution of the 16 weights, denoted as w1,1 to w1,16, is shown in figure 14(d). They are stored in memristors M1,1 to M1,16 within a node of the chip. It can be observed, according to the chip simulation, that after approximately 0.8 s the weights w1,1 to w1,8, which are associated with synapses receiving correlated inputs, all tend to a high value. The weights w1,9 to w1,16, associated with synapses receiving uncorrelated inputs, all tend to a low value. This is the desired behavior, which matches the results observed using idealized neural models [28, 29].

Each node consumes a total average power of 220 μW. This is divided into two parts: an average power of approximately 110 μW that is used by the analog circuitry of the node and another 110 μW that is used by the digital circuitry of the node. The analog circuitry includes the 128 memristor array associated with one node and the CMOS interface circuitry to write and read the memristors. The digital circuitry includes the neuron, the CMOS synapse/STDP circuit and CMOS memories.

Note that the power of one active node was simulated by performing an average over 0.9 s for the network shown in figure 14. The power of the chip is extrapolated for the worst case scenario in which all nodes are being used. For larger chips the power will scale with the number of nodes, assuming that they are being used. The complete chip described in this paper, with 576 nodes, time multiplexed 70 K virtual synapses and input/output circuitry has a power consumption of 130 mW. This power metric scales to about 225 W for 1 million nodes. This is higher than the ultimate goal of the project [2], but we believe that we can improve the power efficiency by adopting two measures. The first is to use our more aggressive design of the neuron, synapse and STDP circuit, as described in [23], which uses less power than the conservative design used in this chip. The second is to scale to a 22 nm CMOS process with a lower operating voltage of 0.6 V.

5. Conclusions

We have presented a scalable neural chip that can be configured to implement different neural networks with neurons, synapses and STDP. The synaptic conductance values are stored in arrays of incremental memristors. The neuron and synaptic computations are realized using CMOS circuits. The CMOS is also used to implement circuitry to read and write memristors and for time multiplexing of synapse and STDP circuits. The chip design contains about 16 million CMOS transistors and 73 728 integrated memristors. The chip includes an auxiliary CMOS memory that enables the chip to operate without using the memristors. This memory was added to facilitate independent evaluation of some of the CMOS components, but it could be eliminated in other designs, if desired, to reduce the number of CMOS transistors. The CMOS circuit also includes programmability to store and adjust parameters such as average neuron speed, kinetic dynamics time constants and STDP characteristics. The number of transistors per node could be further significantly reduced by eliminating some of this programmability. Cadence based chip simulations of both the CMOS circuitry and the memristor circuitry have been provided. Simulations of the chip emulating typical neural networks show the desired behavior, matching results observed using idealized neural models. This design provides a pathway for the design of very large scale neuromorphic systems in the future that solves the scalability, connectivity and synaptic density challenges.

Acknowledgments

We gratefully acknowledge W Lu, and his research group, at the University of Michigan, for contributions in memristor modeling and development, and members of the SyNAPSE project at HRL, including D Wheeler, for contributions in memristor array development, and S Ruiz-Monazzami, for layout. The authors would like to thank the reviewers for their comments which helped in improving the quality of the paper. The authors also gratefully acknowledge the support for this work by Defense Advanced Research Projects Agency (DARPA) SyNAPSE grant HRL0011-09-C-001. The views, opinions, and/or findings contained in this paper are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of the DARPA or the Department of Defense. This paper is approved for public release.

Please wait… references are loading.
10.1088/0957-4484/24/38/384011