1 Introduction

Scaling of MOS transistors leads to several problems such as short channel effects, quantum effects, leakage currents, lithography and fabrication issues [1]. Therefore, it is envisaged that Moore’s law cannot be continued for future technologies. Quantum-dot cellular automata (QCA) is a promising transistor-less nanotechnology which is predicted to supplant the current CMOS technology. QCA is a nanotechnology which was proposed in 1993 by C.S. Lent et al. [1]. Later, the first fabrication took place in the year 1997. Quantum cell is the fundamental element of QCA. The quantum cell is square in shape and has four dots arranged at its corners. It also contains two electrons. Due to the electrostatic interactions, these two electrons exhibit two polarizations characterizing the two binary states, logic 0 and logic 1. As a result, all digital circuits can be developed by quantum layouts as shown in [2]. Thus, QCA provide a groundbreaking solution to nanoscale computation, which opens up a new outlook on circuit design.

In any computer arithmetic computation such as addition, subtraction or multiplication, the adder plays a very important role. Consequently, for the design of high-performance arithmetic circuits, an efficient adder is necessary. This paper introduces an energy-efficient and area-optimized 1-bit full adder design in QCA which effectively brings down the number of quantum cells, area and energy dissipation. Further, to demonstrate the efficiency of the proposed full adder design, the Baugh–Wooley multiplication algorithm has been implemented using the proposed 1-bit full adder and its performance has been analyzed in terms of cell count, area, latency and power dissipation.

The rest of the paper is structured as follows. Section 2 gives the introduction of basic QCA devices. Section 3 describes the clocking mechanism in QCA. Section 4 illustrates the previous works related to adders and multipliers. Section 5 gives the introduction and design of the novel 1-bit full adder and its simulation results. Section 6 describes the implementation of Baugh–Wooley algorithm using the proposed 1-bit adder, and Sect. 7 provides the simulation results and discussion. In Sect. 8, the power analysis is carried out for the quantum multiplier circuit. Finally, Sect. 9 gives the conclusions.

2 Basic layouts in QCA

As discussed in [2], a quantum cell has four circular vacant positions situated at the four corners of a square and two electrons. When no electric field is applied, the electrons are at rest position in the middle of the cell.

During excitation, the electrons exhibit any one of the two polarizations. The polarization determines the logic state of the quantum cell. As shown in Fig. 1, polarization −1 indicates logic 0, while polarization +1 indicates logic 1. If ρ is the charge developed by the electron in a dot in a quantum cell, the polarization is given by the simple formula,

$${\text{P}} = \frac{{\left( {\rho 2 + \rho 4} \right) - \left( {\rho 1 + \rho 3} \right)}}{\rho 1 + \rho 2 + \rho 3 + \rho 4}.$$

In CMOS technology, the interconnections between transistors were carried by metal wires. However, in QCA, the wiring is also done with quantum cells [2]. Figure 2 shows the wiring of QCA in two different ways. One is 90° normal wire as shown in Fig. 2a, and other is 45° inverter wire as shown in Fig. 2b. Former is for normal usage, while the latter is for the crossings in wires.

Fig. 1
figure 1

Quantum cell and its polarization [2]

Fig. 2
figure 2

Wiring in QCA [2]

QCA technology employs two main gates, namely majority voter (MV) and inverter gates. These two gates are designed with quantum cells shown in Fig. 1. Figure 3 shows the majority voter gates and inverter gates. The 3-input majority voter gate depicted in Fig. 3a contains five quantum cells. Between the input and output cells, there exists a cell called device cell which exhibits the function based on the inputs applied on cells 1, 2 and 3. If P, Q and R are the inputs of the majority voter gate, then the output from the output cell is

$${\text{Majority (P, Q, R) = (P and Q) + (Q and R) + (P and R)}}.$$

Figure 3b displays the 5-input majority voter gate proposed in [3, 4] with 10 quantum cells, and Fig. 3c shows the design of the inverter in the traditional way and a simpler way.

Fig. 3
figure 3

a 3-Input majority layout [3], b 5-input majority voter layout [4] and c inverters [4]

3 Clocking in QCA

Unlike CMOS, clocking mechanism is required in QCA for combinational and sequential circuits. Four types of clocking are available, namely clock0, clock1, clock2 and clock3. The fundamental difference between these four clocks is the pacing and transfer state of data within the cell. Each clock has four states—switching, retaining, releasing and soothing, and no two clocks are in the same state. The propagation delay in the quantum logic circuits is dependent on the clocking and the number of clocks used [3]. Figure 4 shows the four states of the clock in QCA.

Fig. 4
figure 4

Clocking in QCA [4]

4 Related works on adders and multipliers using QCA

Adders and multipliers play a vital role in many arithmetic operations, especially, in the design of the multiply and accumulate (MAC) unit in DSP processors. Hence, this paper focuses on the implementation of a full adder and multiplier using QCA.

4.1 Related works on adders using QCA

Many researchers have implemented various configurations using QCA [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21], to accomplish the sum and carry functions of a full adder. Few adders were developed using multilayer layers and some with coplanar layers. Tougaw and Lent adder, Wang adder and Abedi adder are examples of the coplanar adders. In [2], Tougaw and Lent proposed QCA cells which could be used to obtain the XOR function and a 1-bit full adder. The authors used five MV gates, three NOT gates and nine wire-cross-connections to design the full adder. Therefore, the number of MV gates, NOT gates and wire crossings required for an m-bit Tougaw and Lent full adder are 5 m, 3 m and 9 m, respectively, and the latency is m + 0.25 clocks. High number of wire crossovers and poor clocking scheme are the major limitation of this adder. Furthermore, it uses 45° QCA rotated cell for wire crossovers that provides weak coupling.

In 2019, Ahmadpour et al. introduced a reliable full adder in [4] using an effective five input majority layout structure resistant to malfunctions. This 5-input majority layout gate is later used to design a fault tolerant full adder with 173 quantum cells, which can tolerate cell misalignment, cell removal and extra cell insertion.

Wang [5] used the techniques of majority logic reduction to refine and simplify the Tougaw and Lent full adder layout. The m-bit addition layout contains 3 m MV gates, 2 m NOT gates and 6 m wire crossings. Though the design difficulty is reduced, the propagation delay is equal to the Tougaw and Lent adder [2]. However, the m-bit Wang adder latency is m + 0.25 clock cycles. The output carry computation is similar to the Tougaw adder, while the ‘sum’ is calculated with MV gates and inversion operation. The main advantage of this adder is the efficient clocking scheme that minimizes input cross talk.

Cho adder [6], Azhagdi adder [7] and Zhang adder [8] are examples of multilayer adders. H. Cho and E. Swartzlander have proposed a new optimized carry flow adder in [6] using 3 m MV gates, 2 m NOT gates and 2 m crossings. This adder uses latency per bit in a single clock zone that significantly minimizes the latency of larger size adders.

In [7], Azhagdi suggested to use two majority voter gates and one NOT gate for a 1-bit full adder. This adder was constructed using 120 quantum cells in 0.14 μm2 with a delay of 1.25 clock cycles. Zhang adder [8] is identical to Wang adder but multilayered. In both cases, the sum-algorithm is the same. It includes 3 m MV gates, 2 m NOT gates and 3 m multilayer wire crossing for an m-bit adder. The delay is m clock cycle which is less than Wang’s adder.

In [9], Angizi described a full adder with 95 quantum cells within an area of 0.09 μm2 and with a delay of 1.25 clock cycles. Pudi [10] suggested Brent–Kung parallel prefix adder architecture focused on QCA. In this design, the authors employed majority logic reduction techniques to minimize the adder delay. This adder structure minimizes the carry computation to a prefix computation.

Stephania Perri et al. in [11] proposed area and delay efficient 8-, 16-, 32- and 64-bit adders in QCA. However, the number of cells is 1606, and the area for 8-bit adder is 1.13 μm2 which is very high in QCA design. In [12], Hashemi et al. introduced a 1-bit full adder with 79 cells and a latency of 1.25 clock cycles in a footprint of 0.05 μm2. Labrado et al. [13] formulated a single layer with full adder and subtractor. In this work, the 1-bit full adder was built with 63 cells in 0.05 μm2 area with a delay of 0.75 clock cycles.

Abedi adder [14] requires two clock zones of non-adjacent cells shifted by 180° for crossing two wires. It results in a QCA full adder architecture with substantial reduction in the number and area of cells in QCA. For an m-bit adder, the adder arrangement uses 3 m majority gates, 2 m inverters and no wire crossover.

Later in [15,16,17], the authors designed a 1-bit full adder and implemented a ripple carry adder using it. The cell counts of these three full adders are 47, 46 and 44, respectively. The areas of these adders are 0.03 μm2, 0.04 μm2 and 0.04 μm2, respectively.

Further in 2019, Zoka et al. built a novel full adder–subtractor in nanotechnology utilizing QCA in [18]. The full adder is implemented in this system using 44 cells in 0.06 μm2 area with latency of 1.25 clock cycles.

Heikalabad et al. in [19] used an XOR gate to build a layout for 3-bit addition. However, 41 numbers of quantum cells with an area of 0.03 μm2 were required for the full adder. Ahmad et al. used a new paradigm of explicit cell interaction in 2016 in [20] and created a QCA adder. For the 3-input XOR gate with which a full adder was built, a new well-optimized configuration is proposed. The results show that the 1-bit full adder has 41 quantum cells in 0.04 μm2 area with latency of 0.5 clock cycles.

In [21], Balali et al. have proposed a 1-bit full adder using 41 quantum cells in 0.04 μm2 area with a delay of 0.5 clock cycles. Among these 41 quantum cells, 14 cells are used to design XOR gate using the explicit interaction method. Mohammadi et al. in [22] published a novel approach to formulating a 1-bit full adder. This method involves 38 quantum cells with a delay of 0.75 clock cycles utilizing 0.02 μm2 area. Bibhash Sen et al. in [23] proposed an efficient layout for 3-bit addition. The authors suggested a 5-input majority layout and designed a full 1-bit adder using it. The number of cells and the adder delay is less, but the circuit is more complex for designing higher-order adders.

In 2016, Sarmadi et al. in [24] suggested a 1-bit full adder utilizing QCA in a multilayer method. This method introduced full 1-bit adder in six layers. The proposed adder was implemented using 30 cells with 1 clock cycle delay in 0.004 μm2 area.

Lei Wang et al. introduced an integrated adder architecture using QCA in [25]. In this method, the full 1-bit adder consists of 28 cells in a 0.01 μm2 area with latency of 0.75 clock cycles. Md. Abdullah-Al-Shafi et al. suggested a multilayer solution to a universal reversible logic gate in [26]. The architecture has good power output yet the delay is more due to the multilayered construction. In 2019, the authors of [27] and [28] have proposed a novel 1-bit full adder and implemented a ripple carry adder using that 1-bit full adder. The 1-bit full adders proposed in [27, 28] are designed based on multilayer approach.

4.2 Related works on multipliers using QCA

Many researchers have proposed designs for multipliers using QCA. M. G. Ganavi and B.S. Premananda [29] have proposed a Wallace tree multiplier using adiabatic logic. Here, the authors have used the positive feedback adiabatic logic in CMOS 180 nm technology. The power dissipation is reduced to 4.11 µW as adiabatic logic was used. Later in [30, 31], serial parallel multipliers were developed using QCA technology. In [30], the multiplier needs 264 quantum cells in 0.27 μm2 area with 0.75 clock cycle delay. However, in [31], the 4-bit serial parallel multiplier requires 256 quantum cells in 0.38 μm2 area with energy dissipation of 0.22nW.

Seong-Wan Kim et al. proposed parallel multipliers and array multipliers in [32]. The authors showed various styles of multiplier algorithms, such as Wallace tree, Dadda multiplier and array multiplier. The 4 × 4 Wallace tree multiplier proposed in this work requires 3295 quantum cell numbers in 7.39 μm2.

Vikram Kumar Pudi and K Sridharan introduced, in [33], a Baugh–Wooley multiplier with an area of 1.8 μm2 and 1982 quantum cells. In [34], Hasan Faraji et al. proposed a tree-based Wallace multiplier with 2900 quantum cells in 3.69 μm2 area. In [35], Ismo Hanninen proposed a Booth recoded radix 4 multiplier using ultra-fast carry save addition, where the area is 140 n2 and the circuit latency is 2.5n + 16 for n-bit multiplication.

Ashvin et al. suggested a 4-bit Vedic multiplier using carry save method in [36]. Using this method, cell count decreased by 30 percent, area improved by 60 percent and latency by 50 percent. The authors also proposed and compared the results to a 4x4 Wallace multiplier.

5 Proposed full adder design

From the analysis of related works, it can be summed up that current full adder QCA designs have the drawbacks of cell count, area, inefficient clocking and latency. Earlier works on adders using QCA have primarily focused on reducing majority voter gates. Nonetheless, reducing majority voter gates alone does not reduce the number of cells needed to make the quantum circuit. Therefore, a novel full adder architecture is proposed in this paper, which aims to mitigate the cell count, provide efficient clocking and thereby reduce the circuit latency.

Considering that the XOR gate is the essential gate needed to obtain the sum function of the full adder, a better design of the XOR gate is necessary for a full adder structure. Previous studies that implemented XOR gate have mostly focused on the reduction of majority voter gates. A 3-input XOR gate was designed using explicit cell interaction method in [20, 21], where the data from input cell to device cell propagate with complemented form from all three directions. Two 2-input XOR gate designs using QCA were suggested in [37]. Later in [38], A.N. Bahar et al. proposed a 3-input XOR gate which had fewer cells. This XOR gate was designed with only 10 cells having an area of 0.01 μm2.

The existing 2-input and 3-input XOR logic implementations using QCA are compared in Table 1 which suggests that the 3-input XOR gate proposed in [38] is better in terms of number of cells, area and clock phases.

Table 1 Comparison of 3-input XOR gates

In the XOR gate [38], clock1 is applied to the inner five quantum cells, and clock0 is applied to the remaining outer cells. The application of clocks to the quantum cells is differentiated based on cell colors as shown in Fig. 5. The inputs are applied to clock0 cells, and the output is taken from clock1 cells. The time gap between the two clocks and the arrangement of the cells make the entire logic to work as an XOR function. This 3-input XOR gate [38] is used in this paper to design a novel 1-bit full adder.

Fig. 5
figure 5

3-Input XOR gate [38]

The logic diagram of a 1-bit full adder is shown in Fig. 6.

Fig. 6
figure 6

Logic diagram of a 1-bit full adder circuit

For the 1-bit full adder design proposed in this paper, the 3-input XOR gate [38] is used to obtain the sum output and a majority voter gate for carry output. The novelty can be listed as:

  1. 1.

    Efficient clocking so that there is no delay between sum and carry outputs.

  2. 2.

    Selection of the best possible input application to minimize latency.

  3. 3.

    Reduced cell count.

  4. 4.

    Reduced energy dissipation.

The working of the proposed design is explained as follows:

Any design optimization in QCA depends on the application of inputs and clocks on the quantum cells. Internally, the 3-input XOR gate structure has two distinct clocks, clock0 and clock1. However, in all quantum cells, the basic MV gate has a single clock. Hence, in the proposed 1-bit full adder design, the outputs sum and carry are made to occur simultaneously, by making the delay equal for both sum and carry outputs. In Fig. 7a, the 1′s and 0′s indicate the clocks on the quantum cells, and the arrows indicate the input and output directions. The XOR gate takes a delay of one clock to execute the sum. The clock1 will be in switching phase when clock0 is in hold phase. It takes a quarter-period of the clock to have the device cell data in XOR gate. The MV gate will undergo the same delay in processing the carry out concurrently with the sum. The clock of the quantum cells in the MV gate must therefore be advanced from clock0 to clock1 in the proposed design, which will generate the execution output with the same delay as the XOR gate sum output.

Fig. 7
figure 7

a Clocks of majority voter and XOR gates of proposed design. b: Proposed 1-bit full adder block diagram. c: Proposed 1-bit full adder layout

Majority voter gate XOR gate

The next significant issue with the design of quantum cells is how and where the inputs are applied. As the XOR and the MV gates have common inputs, the design can be implemented as shown in Fig. 7b. If these inputs are applied to each gate separately, then more quantum cells are needed. Nonetheless, the cells take the inputs during the clock0 stage in the proposed design.

Given where the inputs are located, the input1, input2 and input3 can be applied anywhere between the two gates on the cells existing between them. The selection of the quantum cell to apply a specific input, however, depends on the time delay at which the three inputs concurrently enter the unit cell. The three inputs should simultaneously reach the device cell, so that both sum and carry are obtained at the same time. Figure 7c shows the logical implementation of a 1-bit full adder. For both XOR and MV gates, the input A is applied precisely at the same rate. The B and C inputs are connected in such a way that these inputs can enter both gates simultaneously. Here, input B is closer to the MV gate and input C is closer to the XOR gate. Thus, all inputs can simultaneously reach the device cells of the two gates. The proposed layout of 3-bit addition with a total of only 26 quantum cells and 0.02 μm2 area is depicted in Fig. 6c.

5.1 Comparison of 1-bit full adders

Comparison of the proposed and several existing full adders is shown in Table 2 to illustrate the improved performance of the proposed full adder design. Table 2 gives the total number of cells and the area that the quantum cells occupy in order to build a 1-bit full adder. It is understood from the table that the proposed 1-bit full adder design is much better with regard to both cell number and latency.

Table 2 Comparison of cells, area and delay of various adders

5.2 Energy dissipation of 1-bit full adder designs

Another important parameter to contend with is the energy dissipation of QCA layouts. Energy dissipation of the proposed layout is measured using QCADesigner-E method using coherence vector simulation engine setup. Comparison is carried out with existing 1-bit full adder layouts in [9], [19] and [26] for which energy dissipation analysis was carried out. All the adders are tested at 1 K temperature, and the dissipation of energy in terms of electron volts is given. The Sum_Ebath describes the overall amount of energy transfer that the electrons are taking to pass through the whole circuit, and the Avg_Ebath specifies the average amount of energy transfer between the cells in the circuit. The outcomes of the analysis are given in Table 3.

Table 3 Energy dissipation of various 1-bit full adders

It is inferred from the table that the proposed adder exhibits desired features at 1 K. The behavior of the adder is also observed at higher temperatures. Table 4 shows the variation in energy dissipation in relation to temperature for the proposed full adder design in QCA. It can be observed that the proposed 1-bit full adder has proven to have very low energy dissipation up to 38 K temperature, keeping all the other parameters common.

Table 4 Energy dissipation study with respect to temperature

The temperature versus energy characteristics are shown in Fig. 8. This indicates that the novel 1-bit full adder has lower energy dissipation compared to the prior methods. It is demonstrated that dissipation of energy can decrease as the temperature rises. The proposed full 1-bit adder thus has excellent performance at 38 K temperature.

Fig. 8
figure 8

Energy dissipation analysis with respect to temperature

6 Baugh–Wooley 4 × 4 multiplier design

The proposed 1-bit full adder is used to build a 4x4 multiplier, which is another important element in DSP processors. A signed multiplication algorithm, proposed by Baugh–Wooley, is implemented here with the proposed full adder. The Baugh–Wooley multiplier equation is as follows:

Consider two signed numbers as

$$A = - a_{p - 1} 2^{p - 1} + \sum\limits_{i = 0}^{p - 2} {a_{i} 2^{i} }$$
$$B = - b_{p - 1} 2^{n - 1} + \sum\limits_{j = 0}^{p - 2} {b_{j} 2^{j} }$$
$$P = A*B$$
$$= ( - a_{p - 1} 2^{p - 1} + \sum\limits_{i = 0}^{p - 2} {b_{i} 2^{i} } )*( - b_{p - 1} 2^{p - 1} + \sum\limits_{j = 0}^{p - 2} {b_{j} 2^{j} } )$$
$$= a_{p - 1} b_{p - 1} 2^{2p - 2} + \sum\limits_{i = 0}^{p - 2} {\sum\limits_{j = 0}^{p - 2} {a_{i} } b_{i} 2^{i + j} } - \sum\limits_{j = 0}^{p - 2} {a_{p - 1} b_{j} 2^{j} 2^{p - 1} } - \sum\limits_{i = 0}^{p - 2} {a_{i} } b_{p - 1} 2^{i} 2^{p - 1}.$$

In the product terms, there are two negative numbers. They are represented using 2′s complement for the proposed implementation, so that adders can be used for the last two product terms also.

The first negative equation can be solved by taking the product terms as ‘ML’

$$M = \sum\limits_{i = 0}^{p - 2} {m_{p - 1} 2^{i + p - 1} }.$$

The MSB and LSB positions for ‘M’ are 22p−3 and 2p−1, respectively. As the bits are not available for 22p−1 and 22p−2, zeroes are assumed in these positions.

22p−1

22p−2

22p−3

22p−2

22p−1

……………

2p+2

2p+1

2p

2p−1

0

0

\(m_{p - 1}\)

\(m_{p - 2}\)

\(m_{p - 3}\)

……………

\(m_{3}\)

\(m_{2}\)

\(m_{1}\)

\(m_{0}\)

Now performing 1′s complement, the bits are

1

1

\(\bar{m}_{p - 1}\)

\(\bar{m}_{p - 2}\)

\(\bar{m}_{p - 3}\)

………………………….

\(\bar{m}_{3}\)

\(\bar{m}_{2}\)

\(\bar{m}_{1}\)

\(\bar{m}_{0} + 1\)

For 2′s complement, ‘1′ is added at LSB, so that the bits are

1

1

\(\bar{m}_{p - 1}\)

\(\bar{m}_{p - 2}\)

\(\bar{m}_{p - 3}\)

………………………….

\(\bar{m}_{3}\)

\(\bar{m}_{2}\)

\(\bar{m}_{1}\)

\(\bar{m}_{0} + 1\)

Similarly, for the other negative number, L, the 2′s complement can be written as

1

1

\(\overline{l}_{p - 1}\)

\(\overline{l}_{p - 2}\)

\(\overline{l}_{p - 3}\) …………………..……

….

\(\overline{l}_{3}\)

\(\overline{l}_{2}\)

\(\overline{l}_{1}\)

\(\overline{l}_{0} + 1\)

The addition of bits at positions 22p−1 and 22p−2 is given by:

$$\begin{array}{*{20}l} {} \hfill & {2^{2p - 1} } \hfill & {2^{2p - 2} } \hfill \\ {} \hfill & 1 \hfill & 1 \hfill \\ {} \hfill & 1 \hfill & 1 \hfill \\ 1 \hfill & 1 \hfill & 0 \hfill \\ \end{array}.$$

The additional carry is discarded, and for the 22p−2 bit, 1 is added. At the LSB, the two 1′s are added and a carry is generated which is added in the 2p−1 position. Therefore, the multiplication algorithm can be simplified using the mathematical analysis mentioned above. The terms associated with the negative signs (that is, most significant bits) will have to be complemented.

Consider the Baugh–Wooley multiplication algorithm for 4 × 4 multiplication, as shown in Fig. 9a.

Fig. 9
figure 9

a Baugh–Wooley multiplication and b implementation algorithm using the proposed adder

The proposed full adder is used to add the product terms generated at the intermediate level. As in Fig. 9b, the product terms kept in a box are added by the proposed full adder. The output sum will be taken, and carry will be propagated to the succeeding stages. The last carry generated by the full adder needs to be added to 1. However, instead of addition, the carry bit can be simply complemented using an inverter.

Figure 10 displays the QCA layout of the 4x4 Baugh–Wooley multiplier designed by means of the proposed full adder. The multiplier is constructed over an area of 1.64 μm2 with a total of 1638 quantum cells.

Fig. 10
figure 10

Layout of proposed 4 × 4 Baugh–Wooley multiplier using the proposed novel full adder design

7 Simulation results and discussion

Figure 11 displays the simulation results of the Baugh–Wooley algorithm for different input values. (values are represented in decimal format for understanding). Two cases are discussed here:

Fig. 11
figure 11

Simulation results of Baugh–Wooley multiplier implemented using proposed method

  • Case 1

  • Let A = 0110, B = 0111.

  • As MSB of the two bits are 0′s, there is no requirement to apply 2′s complement to both A and B. Multiplication can be applied directly on the two numbers, and it is 00101010. The result in decimal is shown as 42.

  • Case 2

  • Consider the numbers to be multiplied as 0001 and 1111, respectively. In decimal, the numbers are 1 and −7, respectively. Since the second number is negative, 2′s complement of this is 1001. After multiplying these two numbers, the result will be 10000111 which is shown as −7. (in decimal)

  • According to the algorithm, the multiplication is as follows:

  • The result of the algorithm is 249 as shown in Fig. 11 which is in 2′s complement. Rewriting in sign magnitude form, the 2′s complement of 11111001 = 100000111 = −7

  • The simulation results displayed in Fig. 11 indicate that the output is obtained after 6.75 clock pulses. The latency of the result is due to the clock phases used in the quantum cells in the design of the multiplier. Table 5 presents the cell number, area and propagation delay of the proposed and existing QCA multipliers. As can be seen, compared to the existing design, Baugh–Wooley 4 × 4 [33], the delay is slightly greater. Nonetheless, the proposed design is efficient, since it reduces the QCA cells by 9% and the area by 17.4%.

    Table 5 Comparison of the existing and proposed QCA multipliers

8 Power analysis of the proposed Baugh–Wooley multiplier

The power dissipation of a quantum logic circuit has been calculated based on clock and quantum cells. In [37], the power dissipation of the XOR gate was calculated using a simple QCAPro power dissipation tool with different kink energies. In [38], the power analysis of multiplexers or any quantum circuits was performed using Hartree–Fock mean-field approximation procedure. In [39] and [40], the power dissipation was calculated using Hamming distance between the successive bits and kink energy. The count of MV gates and NOT gates was also involved. In this paper, the power dissipated by the proposed multiplier is computed from the outputs derived from the QD-E tool (Table 6).

Table 6 Comparison of power dissipations

In this paper, using QD-E tool, the energy dissipation is obtained through the coherence vector simulation engine setup in eV. Using this, the power dissipation is calculated as follows:

Total energy dissipation of 4 × 4 multiplier (in eV) is: 4.10e-001 eV (error: ± −3.08e−002 eV).

Charge of each electron is 1.607x10−19 joules.

Thus, the total energy of 4 × 4 multiplier (in Joules) is

$$\begin{aligned} {\text{Energy }} = & \, 4.1 \, \times \, 10^{ - 1} \times \, 1.607 \times 10^{ - 19} {\text{Joules}} \\ = \, 6.588\; \times \;10^{ - 20} {\text{Joules}} \\ \end{aligned}.$$
(1)

Now, the power dissipation can be calculated with basic formula

$$E \, = \, P_{\text{d}} \times \, T,$$
(2)

where E is energy dissipation, Pd is power dissipation, and T is the time interval between input and output or latency.

The latency of 4 × 4 multiplier is 6 + 3/4 clock cycles. Each clock cycle has a clock period of 4 × 10−12 s. Thus, the total delay is

$$\begin{aligned} T \, = & \, \left( {6 + 3/4} \right) \, \times \, 4 \times 10^{ - 12} \\ &= \, 27 \times 10^{ - 12} {\text{seconds}} \\ \end{aligned}.$$
(3)

Therefore, the power dissipation can be calculated from (1) and (3) as

$$P_{\text{d}} = {\text{Energy/Time}} = \left( {6.588 \times 10^{ - 20} } \right)/ \, \left( {27 \times 10^{ - 12} } \right){\text{Power dissipation}}, \, P_{\text{d}} = 2.44{\text{ nW}} .$$

From Fig. 12, it is clear that the power dissipation of the proposed multiplier is very less compared to the existing multipliers. Hence, the 4 × 4 multiplier, designed with proposed 1-bit full adder, dissipated only 2.44nW power, which is quite less compared to the power dissipation of the Wallace tree and Baugh–Wooley multipliers developed in [29], [41], [42] and [43] where the power dissipation is 22.62 µW, 4.11 µW, 67.38mW, 4.6 µW and 0.49 W, respectively.

Fig. 12
figure 12

Power dissipation of the existing and proposed multipliers in mW

9 Conclusion

As quantum-dot cellular automata is one of the potential future nanotechnologies, digital circuits employed in many areas like signal and image processing can be implemented with it. This paper comes up with a novel 1-bit full adder with lesser number of quantum cells in a very small area of 0.02 µm2 with a propagation delay of 0.5 clock cycles. The performance metrics are improved by 8% in the total cells used and with 75% area utilization. The energy dissipation of proposed full adder is also less which is in the order of 2.23e−02 eV at 1 K temperature, indicating a 4% improvement and 1.01e−02 eV at 38 K temperature, which is 56.7% improvement compared to the existing 1-bit full adder designs. A 4 × 4 Baugh–Wooley multiplier is implemented using the proposed 1-bit full adder which also demonstrates better performance. The results tabulated prove that the new design requires only 1638 quantum cells within an area of 1.64 µm2. The energy dissipation of the multiplier is just 4.10e−001 eV at 1 K. Unlike the other existing designs, the proposed multiplier circuit is reliable even at 100 K temperature. In addition to this, the power dissipation calculated for 4x4 multiplier design is only 2.44nW.