Dynamic computing random access memory

F L Traversa; F Bonani; Y V Pershin; M Di Ventra

doi:10.1088/0957-4484/25/28/285201

1. Introduction

There is currently a surge of interest in alternative computing paradigms [1] that can outperform or outright replace the present von Neumann one [2]. It is clear that such alternatives have to fundamentally depart from the existing one in both their execution speed as well in the way they handle information. For at least a couple of decades, quantum computing [3, 4] (QC) has been considered a promising such alternative, in view of its intrinsic massive parallelism afforded by the superposition principle of quantum mechanics. However, the range of QC applications is limited to a few problems such as integer factorization [5] and search [6].

In order to obtain a paradigm shift we then need to look somewhere else but no farther than our own brain. This amazing computing machine is particularly suited for massively-parallel computation. It is polymorphic, in the sense that it can perform different operations depending on the input from the environment, and its storing and computing units—the neurons and their connections, synapses [7]—are the same physical object. Such a brain-inspired computing paradigm has been named memcomputing [8] and relies on resistors [9, 10], capacitors or inductors with memory (collectively called memelements) [11, 13] both to store the data and to perform the computation. The features of memelements that make them very attractive from a practical point of view are: (i) they are a natural by-product of the continued miniaturization of electronic devices, and (ii) they can be readily fabricated [12, 14–16] making memcomputing a realistic possibility.

This work reports a memcomputing implementation based on solid-state memcapacitive systems [17] (capacitors with memory). While previous memcomputing schemes[18–25] employ intrinsically dissipative memristive devices [9, 10] (resistors with memory), we take advantage of the very low power dissipation in memcapacitive systems [11] to build a dynamic computing random access memory (DCRAM) capable of storing information and performing polymorphic logic computation. This new platform allows for massively-parallel logic operations directly in memory thus offering a starting point for a practical solution to the von Neumann bottleneck problem [26]. Moreover, we would like to emphasize that our idea is not limited to the specific type of memcapacitive systems used for model calculations reported in this work. For example, ferroelectric capacitors [27] used in FERAM [28] and currently evaluated for new dynamic random access memory (DRAM) solutions are also promising candidates for DCRAM.

While the general topology of DCRAM (figure 1) is similar to that of conventional DRAM, its memory cells are solid-state memcapacitive systems [17]. These are multilayer structures composed of insulating layers (three in the particular realization we consider here) alternated by metal layers. The most external insulating layers are made of high-κ materials with very high potential barrier so that negligible charge can pass through them. On the other hand, the intermediate layer is formed out of a low-κ material with low potential barrier. This choice allows for non-negligible charge migration between two internal metal layers at appropriate bias conditions. If the middle insulator layer is thin enough, the internal charge current is due to quantum tunnelling [29] and can be easily tuned over a wide range of values [30].

Although no prototype of solid-state memcapacitive systems has been realized yet, we point out that its actual realization oriented to VLSI circuits may not be of a simple planar geometry. In fact, DRAM capacitors are normally of cylindrical shape. Consequently, a possible realization of solid-state memcapacitive systems could consist of three cylindrical capacitors forming an effective solid-state memcapacitive system.

2. Example of memcapacitor structure and device optimization

The capacitance ${{C}_{d}}$ of the solid-state memcapacitive system we consider here is defined using the standard relation $q={{C}_{d}}{{V}_{C}}$ , where q is the charge on the capacitor plates (external metal layers) and ${{V}_{C}}$ is the voltage applied to the system. Importantly, ${{C}_{d}}$ is a function of the internal state, namely, it depends on the ratio $Q/q$ where Q is the internal charge (see the top left inset in figure 4) [17]. Moreover, ${{C}_{d}}$ can diverge and take negative values [17] leading to a variety of transient responses.

**Figure 4.** Single cell response to a voltage pulse under READ/WRITE conditions as described in figure 1. In our simulations, the bit and dual bit lines are modeled as transmission lines with typical parameters for DRAM R = 1.5 ${\rm k}\Omega$ mm⁻¹ and C = 0.2 pF mm⁻¹ assuming 1 mm line length. The voltage pulse is a smooth square pulse of 1 V amplitude and 1 ns width starting at t = 1 ns. The main graph is the current response measured at the end of the bit line for several initial values of the internal charge Q. The red line refers to Q = 0 initial condition. To quantify Q, an effective internal voltage difference (IVD) is defined as ${{V}_{i}}=Q/{{C}_{2}}$ with ${{C}_{2}}$ the geometrical capacitance of the intermediate layer, ${{C}_{2}}=A{{\varepsilon }_{0}}{{k}_{{\rm low}-k}}/{{d}_{{\rm low}-k}}$ , where A is the surface area, ${{\varepsilon }_{0}}$ is the vacuum permittivity, ${{k}_{{\rm low}-k}}$ is the relative permittivity of the central layer, and ${{d}_{{\rm low}-k}}$ is its thickness. The top right inset shows the cellʼs dissipated energy. Bottom left inset: the effective internal voltage difference as a function of voltage pulse amplitude in 1 s after the voltage pulse application.
Download figure:
Standard image High-resolution image

${\rm k}\Omega $ — **Figure 4.** Single cell response to a voltage pulse under READ/WRITE conditions as described in figure 1. In our simulations, the bit and dual bit lines are modeled as transmission lines with typical parameters for DRAM R = 1.5 ${\rm k}\Omega$ mm⁻¹ and C = 0.2 pF mm⁻¹ assuming 1 mm line length. The voltage pulse is a smooth square pulse of 1 V amplitude and 1 ns width starting at t = 1 ns. The main graph is the current response measured at the end of the bit line for several initial values of the internal charge Q. The red line refers to Q = 0 initial condition. To quantify Q, an effective internal voltage difference (IVD) is defined as ${{V}_{i}}=Q/{{C}_{2}}$ with ${{C}_{2}}$ the geometrical capacitance of the intermediate layer, ${{C}_{2}}=A{{\varepsilon }_{0}}{{k}_{{\rm low}-k}}/{{d}_{{\rm low}-k}}$ , where A is the surface area, ${{\varepsilon }_{0}}$ is the vacuum permittivity, ${{k}_{{\rm low}-k}}$ is the relative permittivity of the central layer, and ${{d}_{{\rm low}-k}}$ is its thickness. The top right inset shows the cellʼs dissipated energy. Bottom left inset: the effective internal voltage difference as a function of voltage pulse amplitude in 1 s after the voltage pulse application.
Download figure:
Standard image High-resolution image

The internal memory of the memcapacitive system [17] arises from the delay of the internal charge response caused by a tunneling barrier of the central insulator layer [17]. The tunneling barrier can be lowered by a voltage bias applied to the capacitor plates. In this case, a finite internal current (between the internal metal layers) changing Q is possible. The internal charge Q becomes trapped when the shape of the potential barrier is restored. Therefore, the applied voltage pulses can be used to control the internal charge Q, which can be subsequently stored.

Here we discuss the features of the solid-state memcapacitor as proposed in [17], using realistic values of parameters compatible with the 2012 International Technology Roadmap for Semiconductors (Itrs) specifications [1]. From ITRS 2012, the capacity of DRAM cell is about $20-25$ fF and the equivalent oxide thickness (EOT) is 0.5 nm for a high-k material of k = 50. A rapid calculation shows that the area of the metallic layers of an equivalent planar capacitor (common geometries for DRAM capacitors are not in general planar, several complex geometries, e.g., cylindrical or pedestal structures, are employed by different manufacturers) has to be of the order of 0.25 μm², so we use this value in our simulations. Moreover, the physical thickness of the insulator, from the EOT and k = 50, ranges between 6 and 10 nm. Using these data, we consider the memcapacitor structure sketched in figure 1. The thickness of the two high-k insulators is supposed to be 6 nm and we assume they are made of standard modern high-k material (e.g., TiO₂) with k = 50. Finally in our simulations we consider transmission lines with common values for DRAM fabrication, i.e., R = 1.5 kΩ mm⁻¹ and C = 0.2 pF mm⁻¹ for a length of 1 mm.

Physical parameters (thickness and k value) of the low-k layer require a more careful consideration since the lifetime of Q strongly depends on these two parameters. Let us then focus on the storage mode, namely, the situation that follows a WRITE operation (application of 1 ns voltage pulse of certain amplitude). In order to model the least favorable conditions such as a strong external leakage current (due to imperfect switches and other processes), we assume ${{V}_{C}}=0$ irrespective of the written bit. This choice is different from that in common DRAM where, in the storage mode, ${{V}_{C}}>0$ if the stored bit is 1 and ${{V}_{C}}=0$ if it is 0. Our main goal here is to evaluate the possibility of information storage on long time scales compared to typical DRAM decay times using, however, DRAM-like chip structure.

Let us consider a physical model of solid-state memcapacitive system with a barrier height of 0.2 eV for the low-k material and infinite barrier for the high-k one. The equations governing the time variation of Q and q can be written as [17]

$\begin{eqnarray}&&{{V}_{C}}=\frac{Q}{{{C}_{2}}}+\frac{q}{{{C}_{0}}}\end{eqnarray} \tag{ 1 }$

$\begin{eqnarray}&&\frac{{\rm d}Q}{{\rm d}t}=-I\left( Q+q \right)\end{eqnarray} \tag{ 2 }$

where I is the tunnel current through the low-k material, ${{C}_{0}}$ is the (constant) capacitance of the total memcapacitive system (with respect to q only) and ${{C}_{2}}$ is the capacitance of the internal capacitor composed by the low-k material and internal metal layers. If the barrier is sufficiently thin then the current can be approximated by the Simmons formula [30].

Taking into account that ${{C}_{0}}<{{C}_{2}}$ and I is monotonous with a unique 0 at $Q+q=0$ , there is unique steady-state solution $Q=q=0$ at ${{V}_{C}}=0$ . The top inset in figure 2 shows that the current $I\left( Q+q \right)$ is very small at smaller values of $Q+q$ suggesting the possibility of quite low charge relaxation rate at non-zero Q. At ${{V}_{C}}=0$ , equation (2) can be rewritten as

$\begin{eqnarray}&&\frac{{\rm d}Q}{{\rm d}t}=-I\left( \left( 1-{{C}_{0}}/{{C}_{2}} \right)Q \right).\end{eqnarray} \tag{ 3 }$

This equation describes the decay of the internal charge Q in the storage mode. Figure 2 shows the decay of Q for several values of k and layer thicknesses. It is worth noticing that at certain values of parameters, such as the thickness of 10 nm and k = 3.9, the information is stored for a long time. In fact, after ${{10}^{6}}$ s (about 11.5 days) a reasonable amount of charge still remains in the memcapacitive system. Thus, modifying the parameters of the memory cell (the layer thickness, dielectric constant or even the barrier height) one can select an appropriate lifetime of the internal charge Q.

**Figure 2.** Decay of the internal voltage difference (IVD) $Q/{{C}_{2}}$ for different thicknesses and dielectric constants of the middle insulator. This plot shows envelope curves of the internal charge decay that can be used to track the long-time behavior for any initial value of Q. As an example, the dotted black curve represents the decay of IVD $Q/{{C}_{2}}$ for the initial condition $Q/{{C}_{2}}=2$ V at ${{k}_{{\rm low}-k}}=3.9$ and ${{d}_{{\rm low}-k}}=10\;{\rm nm}$ . Note, that this curve converges with the corresponding envelope at longer times. The top left inset reports the current $I((1-{{C}_{0}}/{{C}_{2}})Q)$ versus $Q/{{C}_{2}}$ . Top right inset presents the shape of the voltage pulse with $10\;{\rm V}\;{\rm n}{{{\rm s}}^{-1}}$ rising and falling edges.
Download figure:
Standard image High-resolution image

3. WRITE and READ operations

In our scheme, the binary information is encoded in the internal charge Q of the memcapacitive system. It is assumed that $Q\geqslant {{Q}_{r}}$ corresponds to logic 1, $Q\leqslant -{{Q}_{r}}$ corresponds to logic 0, and the logic value is not defined when $-{{Q}_{r}}<Q<{{Q}_{r}}$ . The threshold ${{Q}_{r}}$ is introduced to reliably distinguish logic values, and as such is defined according to the sensitivity of the voltage sense amplifiers (VSA) that we exploit to allow for the bit value detection.

When a voltage pulse is applied to a memory cell, its current response strongly depends on its internal charge Q. We thus use this current response to read the information stored in the memory cell: the common solution (widely used in consumer electronics including standard DRAM technology) employs VSAs.

As depicted in figure 3, the VSA is connected to the memory cell in series with a voltage pulse generator. The ideal characteristics of the VSA are presented in the left bottom inset of figure 3. It is important to know that VSA amplifies the response voltage ${{V}_{{\rm VSA}}}$ if ${{V}_{{\rm VSA}}}>{{V}_{A}}$ , where ${{V}_{A}}$ is a certain threshold voltage. Generally, the delayed response of VSAs is associated to the internal capacitances of the metal-oxide-semiconductor (MOS) structures they are made of. During the delay time, the voltage pulse generator induces the response voltage ${{V}_{{\rm VSA}}}$ . Being amplified, ${{V}_{{\rm VSA}}}$ provides the value stored in the memory cell.

**Figure 3.** Configuration and simulation of READ-REFRESH process. The circuit configuration for this process is presented on the left. It consists of a memory cell connected to a pulse generator and VSA. The bottom left plot shows the ideal VSA response when its input signal is below (red line) and above (blue line) its threshold. The simulation of a READ-REFRESH process for the initial condition of a partially decayed bit ( $Q({\rm t}=0)/{{C}_{2}}=\pm 0.5$ V) are given on the right. Here, the circuit is driven by 0.5 ns length, 1 V amplitude voltage pulse during the delay time of VSA. We report (a) the time variation of the normalized charges $Q/{{C}_{2}}$ (solid and dashed blue lines) and the voltage pulse (dotted red line), (b) the dissipated energy, and (c) VSA output.
Download figure:
Standard image High-resolution image

The WRITE, READ and logic operations with memcapacitive memory cells are performed with the help of control circuitry that provides appropriate signals. In order to make the discussion even more realistic, the parameters we have used throughout the simulations belong to the ITRS 2012 standards [1]. Simulations have been carried out using the general purpose in-house NOSTOS (NOnlinear circuit and SysTem Orbit Stability) simulator developed by one of the authors (FLT) initially for studying circuit stability [31, 32], and recently extended to analyze circuits including memory elements [33]. Let us consider the WRITE operation first. For this purpose, we employ the circuit configuration shown in the top right corner of figure 1 where the dual bit line (DBL) is grounded and the voltage pulse is applied to the bit line (BL). As it is mentioned above, applied voltage pulse lowers the potential barrier between the internal metal layers allowing for an internal charge redistribution.

An important observation that one can make at this point is that the WRITE process is of the threshold type. Indeed, one can define a threshold voltage ${{V}_{t}}$ such that there is no significant charge transfer between the internal plates at applied voltage amplitudes below ${{V}_{t}}$ (see the bottom left inset in figure 4). On the contrary, at pulse amplitudes exceeding ${{V}_{t}}$ a considerable amount of charge can tunnel between the internal layers. In our device structure, ${{V}_{t}}$ is about 0.5 V, which is much larger than the perturbations usually induced by MOS transistor leakage currents. Moreover, the existence of ${{V}_{t}}$ results also in an internal charge saturation shown in the bottom left inset of figure 4.

Next, let us consider the READ operation in DCRAM. Similarly to DRAM, the READ process is destructive (see the top right plot of figure 3: when the voltage pulse acts, the information inside the memory cell is destroyed since the final state inside the memory cell is 1) and thus needs to be followed by a REFRESH operation. In order to have a better understanding, we consider the current response shown in figure 4. One can notice significant variations in the cell response depending on the initial value of Q. These differences are used to measure the logic value stored in the cell with VSAs similarly to DRAM technology. However, VSA amplifies a voltage difference above or below a certain voltage threshold. To meet the VSA modus operandi, the current response can be transformed into the voltage response connecting the bit and DBLs to VSA input terminals. As the voltage pulse used in READ changes the internal charge Q, a suitable REFRESH operation is applied after the READ.

In summary, the sequence consists of two steps. First, a voltage pulse (in our simulations, of 0.5 ns length and 1 V amplitude) is applied by the generator. It produces a voltage response that is considered as input for VSA during its 'delay state'. Subsequently, if ${{V}_{{\rm VSA}}}>{{V}_{A}}$ the VSA amplifies the voltage ${{V}_{{\rm VSA}}}$ and 0 is written, on the contrary, if ${{V}_{{\rm VSA}}}<{{V}_{A}}$ the VSA does not act and 1 is written. Figure 3 reports simulations of the READ-REFRESH process considering an extreme case of a partially decayed bit showing all the features mentioned above. Moreover, we would like to emphasize that the dissipated energy has a significant dependence on the value of bit (0 or 1). This is due to an asymmetry in VSA response. In fact, when ${{V}_{{\rm VSA}}}>{{V}_{A}}$ (VSA is activated) the dissipated energy is about 5 fJ. In the opposite case (initial value is 1) this energy is about 1 fJ.

The top right inset of figure 4 shows the dissipated energy when a pulse of 1 ns length and 1 V amplitude is applied. In fact, this calculation gives a reference for the order of magnitude of the dissipated energy for all DCRAM operations (WRITE, READ, COMPUTATION) because of close operating conditions. It is worth noticing that this energy is of the order of few fJ, comparable to the best cases of extremely low-energy storage and computation [34], and computation only with CMOS architectures [34]. Importantly, the information is stored directly in DCRAM saving the power usually needed to transfer it to/from the central processing unit (CPU).

4. Polymorphic computation

Let us consider the simplest realization of logic gates when two memory cells are used to store the input and (after the computation) the output values. For computation purposes, these memory cells are coupled as shown in figure 5 using appropriate switches at the end of the BL and DBL. As shown in figure 5, the dynamics of the internal charges Q of two coupled cells subjected to a couple of synchronized voltage pulses depends on the initial combination of internal charges of these cells. In this way, the final values of the internal charges can be considered as a result of a logic operation over initial values stored at t = 0 in these cells (see figure 5). As a specific example, let us consider configuration 2 from figure 5 assuming that $-0.73$ V and 0.73 V amplitude voltage pulses are applied to the memory cells. Figure 6 demonstrates the evolution of Q for both cells. Notice that the final values of Q in cells A and B realize OR and AND gates, respectively. The dissipated energy (bottom plots in figure 6) is quite low: it is less than 2 fJ in the worst case scenario, and, in the case of (1, 0) initial configuration, it is much lower. However, it is worth noticing that, after computation (see figure 6), the bits stored in the cells are only partially written: the computation process must be completed by a REFRESH process, thus increasing the total required energy per operation by a few fJ, depending on the actual realization of VSA.

**Figure 5.** Map of logic operations. Two memory cells can be connected in four different ways giving rise to four logic operations. The symbols $+$ and − are the OR and NOT operation respectively, while the AND operation is the implicit multiplication. Here, ${{V}_{1}}$ and ${{V}_{2}}$ are amplitudes of voltage pulses applied to the external connections of the coupled memory cells. Depending on these amplitudes, there are several regions in the logic map. Amplitudes belonging to the *identity* region do not change initial values in memory cells. Amplitudes belonging to the *logic operation* region perform computation as in the scheme to the right. Amplitudes belonging to the *forced state* region change the initial values to 1 or 0 depending on device coupling order and polarity. Amplitudes belonging to the *non-readable state* region produce an intermediate (non-readable) internal states with $-{{Q}_{r}}\leqslant Q\leqslant {{Q}_{r}}$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Map of logic operations. Two memory cells can be connected in four different ways giving rise to four logic operations. The symbols $+$ and − are the OR and NOT operation respectively, while the AND operation is the implicit multiplication. Here, ${{V}_{1}}$ and ${{V}_{2}}$ are amplitudes of voltage pulses applied to the external connections of the coupled memory cells. Depending on these amplitudes, there are several regions in the logic map. Amplitudes belonging to the *identity* region do not change initial values in memory cells. Amplitudes belonging to the *logic operation* region perform computation as in the scheme to the right. Amplitudes belonging to the *forced state* region change the initial values to 1 or 0 depending on device coupling order and polarity. Amplitudes belonging to the *non-readable state* region produce an intermediate (non-readable) internal states with $-{{Q}_{r}}\leqslant Q\leqslant {{Q}_{r}}$ .
Download figure:
Standard image High-resolution image

**Figure 6.** Time variation of IVD and dissipated energy for the second logic gate of figure 5. The voltage pulse amplitudes are ${{V}_{1}}=0.73\;{\rm V}$ and ${{V}_{2}}=-0.73$ V, and the pulse length is 1 ns. The evolution of IVD for both memory cells at different initial conditions are shown by different line styles in (a) and (c). The dissipated energy is plotted in (b) and (d).
Download figure:
Standard image High-resolution image

**Figure 6.** Time variation of IVD and dissipated energy for the second logic gate of figure 5. The voltage pulse amplitudes are ${{V}_{1}}=0.73\;{\rm V}$ and ${{V}_{2}}=-0.73$ V, and the pulse length is 1 ns. The evolution of IVD for both memory cells at different initial conditions are shown by different line styles in (a) and (c). The dissipated energy is plotted in (b) and (d).
Download figure:
Standard image High-resolution image

Considering possible connections and device polarities one can find that two coupled cells can be used to form a (redundant) basis for a complete set of logic operations. In fact, it is known [22, 35] that combining only AND and NOT or OR and NOT functions, any logic expression can be evaluated. In our case, with two coupled memory cells we can in fact perform 6 different two-bit operations, depending both on how the cells are coupled, and on the amplitudes of the applied voltage pulses. Therefore, these two coupled memory cells form universal logic gates as it is exhaustively proved below.

The universal gate offered by the DCRAM architecture is not its only advantage. DCRAM is capable of intrinsically parallel computation. In fact, after only one computation step, we find a different output on each memcapacitive system: this means two operations at the same time. As shown later, by connecting three memory cells and varying the pulse amplitudes and the connection topology, we can perform even more complex operations in one step, and obtain three different outputs written into each memory cell. More importantly, one can perform simultaneous operations over multiple groups of two or three coupled cells. We also point out that by using only one of the possible connection topologies of three memory cells, we obtain another universal gate for two-bit computation with fixed connection topology representing a possible solution to avoid the supplemental circuitry needed for dynamic connections.

4.1. Two-bit and three-bit operations with dynamic connections

Parallel logic operations performed by DCRAMs, summarized in figure 5, can be used to define logic gates forming a (redundant) complete basis for any boolean logic function. In order to prove this claim, figure 7 shows how to perform all possible two-bit logic operations using DCRAM gates. We notice that in the worst case scenario, a three-bit registry (three cells) is needed (the third bit, initially set to 1, is used to perform negation), and a two-level operation is required. Compared with CMOS NAND logic or NOR logic, DCRAM logic circuits require less components. In fact the commonly used CMOS NAND or NOR logic gates require up to five-level operation scheme, and up to 20 transistors to perform the same set of two-bit functions.

**Figure 7.** Two-bit logic functions. The additional bit set to 1 is used for negation. Circled numbers refer to logic operation of figure 5. Colors denote the memory cell involved in operation and the cell storing the output. W(1) and W(0) stand for the operation WRITE 1 and 0, respectively.
Download figure:
Standard image High-resolution image

Using the same scheme, we can perform any n-bit operation exploiting 2-bit gates only. Here, we make some considerations on three-bit operations, for which a complete treatment is possible. Using a 5-bit registry made of the A, B and C inputs and two additional bits, one set to 1 employed for negation and the other equal to one of the three inputs A, B or C (depending on the desired logic function), any three-bit logical operation can be performed using at most a four-level operation scheme (figure 8). In figure 9(b) an example of four-level three-bit operation is shown. In this case, the registry is composed by the three inputs (A, B and C) and only one additional bit (in this case A), because no bit for negation is required. In figure 9(a), the comparison with a two-input NAND logic (possibly using programmable digital circuits) is reported. It is worth noticing that, using CMOS NAND logic, the same operation is performed within a five-level operation scheme using 10 NAND gates, i.e., 40 transistors, thus proving that the complexity of the CMOS circuit is much higher than for our DCRAM implementation.

**Figure 8.** Number of operation levels for any three-bit boolean function. There are ${{2}^{8}}$ possible boolean functions involving three bits, so in the $x$ -axis each function is coded using the equivalent decimal number.
Download figure:
Standard image High-resolution image

**Figure 8.** Number of operation levels for any three-bit boolean function. There are ${{2}^{8}}$ possible boolean functions involving three bits, so in the $x$ -axis each function is coded using the equivalent decimal number.
Download figure:
Standard image High-resolution image

**Figure 9.** (a) CMOS-NAND logic circuit for the three-bit operation $ABC+\bar{A}B\bar{C}+\bar{A}\bar{B}C$ . (b) DCRAM four-level scheme for the same logic function.
Download figure:
Standard image High-resolution image

**Figure 9.** (a) CMOS-NAND logic circuit for the three-bit operation $ABC+\bar{A}B\bar{C}+\bar{A}\bar{B}C$ . (b) DCRAM four-level scheme for the same logic function.
Download figure:
Standard image High-resolution image

4.2. Fixed connection two-bit operations

Finally, we consider the three-bit gate presented in figure 10. We assume a configuration with fixed connections (while computation is performed). As shown in figure 10, varying the pulse amplitudes applied to the cells we can obtain two different logic outputs for each memory cell. We define these as the logic outputs of the first and second kind. Moreover, at each computation step the REFRESH and WRITE processes are performed to prepare the cells for the next computation step. The bits 1, A and B are initially written in the three memory cells (registry). Then, we apply the synchronized voltage pulses ${{V}_{1}}$ and ${{V}_{2}}$ with amplitude 1.15 V and $-1.15$ V, respectively, to obtain the gate of the first kind. The first-level operation is completed by the REFRESH of the second and third memory cells and by writing 1 in the first one. Then, the second-level operation implements the gate of the second kind, and the boolean function $AB+\bar{A}\bar{B}$ is obtained.

**Figure 10.** Computation of the logic function $AB+\bar{A}\bar{B}$ using three connected memory cells. The topology of the connections is represented in the top left of the figure. The two gates obtained varying the pulse amplitude are sketched on the top right of the figure and indicated by the two different textures on the left of the gates.
Download figure:
Standard image High-resolution image

Using the processes described above, we can set up a universal gate capable to perform any two-bit logic operation without changing the topology of the circuit. For example figure 11 shows how to obtain all possible two-bit logic functions using the three-bit fixed polymorphic gate of figure 10. Finally, figure 12 reports the variety of three-bit polymorphic logic gates that can be implemented using three coupled memory cells. In this case it is evident the separation into two regions of applied voltage amplitudes providing polymorphism without changing the connection topology.

**Figure 11.** Two-bit logic functions. The configuration of the connections for the three-memory cell polymorphic gate is the same as that in figure 10. The textures indicate the gate kind as in figure 10 (depending on the pulse amplitudes). W(1) and W(0) stand for the operation WRITE 1 and 0, respectively, and R = REFRESH. The functions $\bar{B}$ and 1 are not reported for sake of conciseness because they can be simply obtained as in the fifth column for $\bar{A}$ and in the first column for 0, respectively.
Download figure:
Standard image High-resolution image

**Figure 11.** Two-bit logic functions. The configuration of the connections for the three-memory cell polymorphic gate is the same as that in figure 10. The textures indicate the gate kind as in figure 10 (depending on the pulse amplitudes). W(1) and W(0) stand for the operation WRITE 1 and 0, respectively, and R = REFRESH. The functions $\bar{B}$ and 1 are not reported for sake of conciseness because they can be simply obtained as in the fifth column for $\bar{A}$ and in the first column for 0, respectively.
Download figure:
Standard image High-resolution image

**Figure 12.** Logic gates with three coupled cells. In the center, we show a map of operations as a function of amplitudes of pulses applied to the external connections of the coupled memory cells. Depending on these amplitudes, there are several regions in the logic map. Amplitudes belonging to the *identity* region do not change initial values. Amplitudes belonging to the *logic operation* region realize logic functions presented in schemes to the right and left. Amplitudes belonging to the *forced state* region change the initial values to 1 or 0 depending on device coupling order and polarity. Amplitudes belonging to the *non-readable state* region produce an intermediate (non-readable) internal states with $-{{Q}_{r}}\leqslant Q\leqslant {{Q}_{r}}$ . The symbols $+$ and − are the OR and NOT operations, respectively, the implicit multiplication is the AND.
Download figure:
Standard image High-resolution image

**Figure 12.** Logic gates with three coupled cells. In the center, we show a map of operations as a function of amplitudes of pulses applied to the external connections of the coupled memory cells. Depending on these amplitudes, there are several regions in the logic map. Amplitudes belonging to the *identity* region do not change initial values. Amplitudes belonging to the *logic operation* region realize logic functions presented in schemes to the right and left. Amplitudes belonging to the *forced state* region change the initial values to 1 or 0 depending on device coupling order and polarity. Amplitudes belonging to the *non-readable state* region produce an intermediate (non-readable) internal states with $-{{Q}_{r}}\leqslant Q\leqslant {{Q}_{r}}$ . The symbols $+$ and − are the OR and NOT operations, respectively, the implicit multiplication is the AND.
Download figure:
Standard image High-resolution image

5. Conclusions

In conclusion we have introduced a simple, practical, and easy-to-build memcomputing architecture that processes and stores information on the same physical platform using two-terminal passive devices (memcapacitive systems). Being low-power, polymorphic and intrinsically massively-parallel, DCRAM can significantly improve computing capabilities of the present day von Neumann architecture. This is performed by transferring a significant amount of data processing directly into the memory, where the data is stored. Although it is still an open question which specific algorithms will mostly benefit from such an approach, we expect that our scheme will be extremely useful in scientific calculations, image and video processing and similar tasks.

In order to make a specific estimation of computation speed-up using our approach, let us compare a performance of a traditional personal computer equipped with typical DRAM chips with this of a DCRAM-based computer. For example, consider a 4 GB memory system, with two 2 GB banks, each consisting of eight 256 MBx8, four-bank devices [36]. Moreover, each of the 4 banks in a 256 MB device is split into eight arrays of 8 MB each. If there are 65 536 rows of 1024 columns of bits in each array, a row access provides a selection of 1024 bits per array, giving a total of 65 536 bits across eight chips of eight arrays each. This is the number of bits that can be involved simultaneously in a single parallel calculation using DCRAM, which lasts for about 20 ns (accounting for a four-level computation) as discussed above (here we assume that all 65 536 bits are grouped into small few-bits circuits at each calculation step). On the other hand, a standard CPU processes 64 bits per each clock cycle. Accounting for the memory access time of 10 ns [37], we can conclude from this simple example that a DCRAM could be in principle up to 1000 times faster than the usual Von Neumann architecture.

Finally, we emphasize again that an actual realization of DCRAM is not limited to the employment of solid-state memcapacitive systems considered in this work. Other memcapacitive systems could serve as even better solutions for practical implementations of DCRAM. We thus hope that our results will be of interest to a wide community of researchers and lead to the next generation of brain-like computing memory.

Acknowledgments

This work has been partially supported by the Spanish Project TEC2011-14253-E, NSF grants No. DMR-0802830 and ECCS-1202383 and the Center for Magnetic Recording Research at UCSD.

Dynamic computing random access memory

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Example of memcapacitor structure and device optimization

3. WRITE and READ operations

4. Polymorphic computation

4.1. Two-bit and three-bit operations with dynamic connections

4.2. Fixed connection two-bit operations

5. Conclusions

Acknowledgments

Dynamic computing random access memory

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Example of memcapacitor structure and device optimization

3. WRITE and READ operations

4. Polymorphic computation

4.1. Two-bit and three-bit operations with dynamic connections

4.2. Fixed connection two-bit operations

5. Conclusions

Acknowledgments