September 13-15, 2004 Poznań, POLAND

# ICSES'04



# 200 mV Full Adder Based on a Reconfigurable CMOS Perceptron

Snorre Aunet<sup>#</sup>, Bengt Oelmann<sup>##</sup>, Ola G. Lein<sup>###</sup>, Yngvar Berg<sup>#</sup>

#Department of Informatics, University of Oslo, Postbox 1080 Blindern, N-0316 Oslo, Norway.
##Department of Information Technology and Media, Mid-Sweden University, SE-851 70, Sundsvall,
Sweden. e-mail: Bengt.Oelmann@mh.se

###Department of Computer and Information Science, Norwegian University of Science and Technology, Sem Sælands veg 7-9, NO-7034 Trondheim, Norway.

Abstract—This paper presents a full-adder based on identical realtime re-configurable CMOS perceptron circuits operating in subthreshold. The Perceptron is based on a configuration of output wired inverters that is configured through substrate biasing. The full-adder is demonstrated by simulations, based on schematics, for a 0.12 um CMOS process. Functionality is proven for 200-400 mV power supply voltages. Minimum power consumption is 15.5 nW and power-delay-product 6.1 fJ, for a  $V_{dd}$  of 200 mV.

Keywords—subthreshold, perceptron, adder, CMOS

### I. INTRODUCTION

To be able to exploit future CMOS technologies the power dissipation must be reduced due to several reasons [1]. Among those are the intractable power, with predicted power dissipation per area about the same order of the surface of the sun in 2010, if not proper action is taken for reduction [1]. Increased reliability of circuitry and improved battery lifetime may also follow from being able to reduce power dissipation. Subthreshold / weak inversion circuits may consume less power than other known low-power circuits, including energy recovery logic [2]. Subthreshold circuits typically operate on a relatively low  $V_{dd}$ , which may be nice from a low-power viewpoint, since reducing the supply voltage offers the most direct and dramatic way of reducing the power consumption [3]. We present a FULL-ADDER based on using the same basic building block [4], [5] extensively. This reduces the amount of basic building blocks that one need to consider during design and also improves matching between components, hopefully leading to increased yield. The basic circuit is a perceptron, or threshold logic circuit. An in-depth survey of VLSI implemen-



Fig. 1. The 3 inverters to the right produce a low output if, and only if, minimum 1, 2 or 3 high (binary) inputs are applied to [X,Y,Z]. The voltage on the wells controls the threshold, and thereby chooses the functionality.

tations of threshold logic can be found in [6].

The circuit is real-time reconfigurable, so that the implemented functionality can be changed in real-time by changing a voltage on a control input which in this case is the transistor wells. For the twin-well hcmos9gp process used here, available through Circuits Multi Projets, the wells of the PMOS and NMOS transistors are tied together, inspired from [7]. The basic circuit contains three inverters, like to the left in figure 1, wired together like those to the right. When the voltage of the wells is at  $V_{ss}=0V$ . the circuit can compute the 3-input NAND function, when the same voltage is at  $V_{dd}$  level it can compute the 3-input NOR function of the inputs X, Y and Z. If the voltage is about  $V_{dd}/2$  it computes the inverted CARRY function.

The paper is organized as follows: After this introduction there is a small analysis of the 3-inverter circuit. Then the FULL-ADDER is explained. Simulated re-

sults come before a discussion.

# II. SIMPLE ANALYSIS OF THE BASIC THRESHOLD GATE

#### A. MOS transistors in subthreshold

For an NMOS transistor in subthreshold we have [8] in equation 1

$$I_{ds,n} = I_{0}exp\{\frac{\kappa V_{gs}}{U_{t}}\}exp\{(1-\kappa)\frac{V_{bs}}{U_{t}}\}(1-exp\{\frac{-V_{ds}}{V_{t}}\}+\frac{V_{ds}}{U_{0}}) \quad (1)$$

expressing the current between drain and source.  $I_0$  is the zero-bias current for the given device, a constant where all the pre-exponential constants have been absorbed. This includes the channel width ("W") and the length ("L") of the MOSFET structure.  $V_{gs}$  is the gate-to-source potential,  $V_{ds}$  the drain-to-source potential and  $V_{bs}$  the substrate-to-source potential.  $V_0$ is the Early voltage, which is proportional to the channel length.  $\kappa$  gives the effectiveness for which the gate potential is controlling the channel current. It is often around 0.7-0.75 [8]. The thermal voltage is expressed as  $U_t = kT/q$ . Boltzmann's constant,  $k = 1.38 \cdot 10^{-23}$ and the elementary charge,  $q = 1.602 \cdot 10^{19} \ Coulomb$ . At room temperature T = 300 degrees Kelvin and thus  $U_t = 25.8 mV$ . When  $V_{ds} \geq 4U_t$ , neglecting the Early effect, we get

$$I_{ds} = I_0 exp\{\frac{\kappa V_{gs}}{U_t}\} exp\{(1-\kappa)\frac{V_{bs}}{U_t}\}.$$
 (2)

It is clear that the substrate is a terminal that can control the drain-source current to a significant extent. It is claimed in [8] that taking only  $V_{gs}$  into account is sufficient for many designs, and that when  $V_{ds} \geq 4U_t$ , the transconductance,  $g_{ms}$ , and output conductance,  $g_{dsat}$ , can be expressed as

$$g_m = \frac{\Delta I_{ds}}{\Delta V_{qs}} = \frac{\kappa I_{ds}}{U_t} \quad , \quad g_{dsat} = \frac{\Delta I_{ds}}{\Delta V_{ds}} = \frac{I_{ds}}{V_0}. \quad (3)$$

Both  $V_{gs}$  and  $V_{bs}$  control the current levels as shown in equation (2) and the output resistances,  $R_{out}$ , for each single transistor. The output resistance of the device equals the inverted output conductance value given by equation (3). The relative "strength" of the gate and substrate voltages is determined by  $\kappa$ . The PMOS can be described by similar equations including opposite polarities.

Adjusting the substrate voltages is exploited to change the threshold, and functionality, of the compound circuitry. Each inverter becomes a voltage divider between  $V_{dd}$  and 0 V, trying to force the output to a level determined by the voltages on it's nodes, illustrated in figure 2. The output voltage is on the next level determined from the compound 3-inverter circuit. Simulations demonstrating this circuit building block, the perceptron, are in the next section.



Fig. 2. Reconfigurable 3-inverter perceptron.

#### B. FULL-ADDER circuit

The truth table for the FULL-ADDER is shown in the table in figure 3. The previously unpublished one presented here [9] utilizes the fact that SUM and CARRY are the opposites of each other with the exception of the input vectors [X,Y,Z]=[0,0,0] and [X,Y,Z]=[1,1,1]. When the "ZorO" signal in figure 8

| X | Y | Z | SUM | CARRY |
|---|---|---|-----|-------|
| 0 | 0 | 0 | 0   | 0     |
| 0 | 0 | 1 | 1   | 0     |
| 0 | 1 | 0 | 1   | 0     |
| 0 | 1 | 1 | 0   | 1     |
| 1 | 0 | 0 | 1   | 0     |
| 1 | 0 | 1 | 0   | 1     |
| 1 | 1 | 0 | 0   | 1     |
| 1 | 1 | 1 | 1   | 1     |

Fig. 3. Truth table for the circuits in figure 2 and figure 7. Voltage on node S in figure 7 is Vdd/2. Adjusting this voltage can make the circuit implement different binary functions.

goes high / is close to  $V_{dd}$ , this means that the inputs are either [0,0,0] or [1,1,1]. In this case the SUM and CARRY nodes are identical and also equal any of the inputs. When this is not the case, SUM and CARRY have the opposite binary values.

A multiplexer function is therefore used to choose between either letting one of the inputs propagate and form the SUM output, or calculating the SUM based on the inverted CARRY, or CARRY 'value.

# III. SIMULATED RESULTS

#### A. Transient simulations for the 3-inverter circuit

An implementation in a 0.12  $\mu$ m CMOS technology has been simulated, with transistors dimensioned as in figure 5. The inherent threshold voltages were 380 mV for the NMOS and -390 mV for the PMOS. The supply voltage was 300 mV and the voltage on the wells had four different values;  $V_{ss}$  / 0 V, 100 mV, 200 mV and  $V_{dd}$ =300 mV. For each of the four well



Fig. 4. The principle behind the FULL-ADDER is depicted. This schematic contains 10 3-inverter circuits. A "3" on the symbol for such a circuit means that 3 high inputs are needed to force the output low. This equals the 3-input NAND function. A "1" means that it implements the 3-input NOR function, while "2" means CARRY '. Shorting two or more inputs provides the NAND2 or INVERT functions, respectively.



Fig. 5. NOR, CARRY' and NAND circuit.

voltages all eight possible combinations of inputs X, Y and Z were applied. The resulting output voltage was simulated, as can be seen in figure 6. For the first 25 % of the time the NAND3 function is computed, then from 25 to 75 % the CARRY' function and thereafter the NOR3 function.

### B. Transient simulations for a FULL-ADDER

The previously mentioned FULL-ADDER principle was implemented, but with one extra element used for improving some intermediate signal levels. The extra



Fig. 6. 3 wired output inverters simulated. The 3-input NAND, inverted CARRY and 3-input NOR functionalities are shown. The inverted CARRY functions is resulting for voltages on the well of 100 mV and 200 mV.



Fig. 7. Testbench for the FULL-ADDER using Cadence Tools. 11 3-inverter circuits are included.



Fig. 8. A FULL-ADDER made from 3-input circuits only is simulated. The 8 possible combinations of the three inputs are used.

element is included in figure 7 and can be found right-most in the third row from the top.

Simulated results are shown in figure 8 and shows the CARRY and SUM nodes among the top couple of signals, as products of the input signals as the 3 lower signals. The signal detecting only zeros or ones among the inputs signals is also included as the 3rd from the top.

## C. Logic level, Power consumption, PDP

The SUM output for three high inputs was the most critical case regarding the logic level according to our simulations. For a  $V_{dd}$  of 175 mV the high output level was only 58 % of the ideal, or about 102 mV. Similar simulations were done for  $V_{dd}$ 's of 200 mV, 300 mV, 400 mV and 600 mV. In the 600 mV case it malfunctioned. The results for the simulations of the SUM node, as previously described, can be found in figure 9.

8, instead of 44 [10], transistions were used to get only a rough estimate of Power-Delay-Product (PDP). The SUM and CARRY outputs were loaded with 3-inverter circuits each. The average current,  $I_{avr}$  was extracted when the circuit ran at about full speed, just allowing the signals to get their full swing. At the same occasion the propagation times from low to high signal level,  $t_{pLH}$ 's, were extracted, which were in general a little bit slower than the falltimes.

| $V_{dd}[mV]$ | ]H[%] | $t_{pLH}[s]$ | $I_{avr}[A]$ | Power    | PDP   |
|--------------|-------|--------------|--------------|----------|-------|
| _            |       |              |              | [W]      | [J]   |
| 175          | 58    | 1.9u         | 3.57n        | 6.25n    | 11.9f |
| 200          | 75    | 0.39u        | 77.7n        | 15.5n    | 6.1f  |
| 300          | 85    | 71n          | $1.09\mu$    | 327n     | 23.2f |
| 400          | 79    | 15.1n        | $8.6\mu$     | $3.4\mu$ | 51.3f |

Fig. 9. H[%] is the SUM node percentage of  $V_{dd}$ .

| CMOS               | #tr. | Power    | PDP     | reference |
|--------------------|------|----------|---------|-----------|
| $[\mu \mathrm{m}]$ |      | [W]      | [J]     |           |
| 0.12               | 66   | 15.5 n   | 6.1 f   | [this     |
|                    |      |          |         | work]     |
| 0.18               | 12   | 47 n     | 0.028 f | [11]      |
| 0.35               | 16   | $63 \mu$ | 18.4 f  | [10]      |
| 0.6                | 8    | 0.52 n   | 2.3 f   | [12]      |

Fig. 10. Comparison of different FULL-ADDERs. In the case that cited work include several FULL-ADDERs only the one with the best PDP number is included.

#### IV. DISCUSSION

Simulations like in figure 8 demonstrate the functionality of the circuit, which had problems producing a logic one when having a supply voltage of 175 mV, as can be seen in figure 9. It also malfunctioned for a  $V_{dd}$  of 600 mV, probably since some transistors then left the subthreshold region and could not earn from the higher relative transconductance in this region, compared to the classical above threshold domain.

Simulations indicate a better PDP for a  $V_{dd}$  of 200 mV than the others in figure 9. In figure 10 some comparisons are made with some other CMOS implementations, though better or more interesting results may exist. Anyway our circuit compares favourable to these regarding power consumption, maybe mostly due to the more aggressive technology and the low  $V_{dd}$ , since power supply voltage reduction offers the most direct and dramatic means of reducing the power consumption [3]. The circuit from [12] is a subthreshold floating-gate implementation with only 0.5 nW power consumption, but can not be denoted "standard CMOS". The best circuit from [11] operated from a  $V_{dd}$  of 1.8 V.

Power-Delay-Product numbers may not be said to be impressive. They are orders of magnitude worse than one of the implementations. Since power consumption is linearly dependent on the physical capacitance being switched [3] one might abandon the matching properties from using identical elements only by searching for a simpler topology than our 66 transistor version, though subthreshold only solutions may still be attractive, since subthreshold circuits consume less power than other known low-power circuits [2].

The simulations should be repeated based on layout, including parasitics that will probably make the results somewhat weaker, though measurements from chip prototypes would be preferable. The biasing used here is based on short circuiting the wells for the

PMOS and the NMOS transistors. A biasing scheme were they are split may be investigated to check out if it could improve the results.

#### V. CONCLUSIONS

A FULL-ADDER circuit built from identical 6-transistor building blocks only has been demonstrated through simulations. Each 6-transistor building block can compute one of the NOR3, CARRY' and NAND3 functions at a time, depending on the substrate-/ well-biasing.

The circuit has been demonstrated for power supply voltages in the 200 mV to 400 mV range, with a minimum power consumption of 15.5 nW and a power-delay-product of 6.1 fJ, according to simulations. Especially the power consumption might compare well to many other implementations. The design space should be explored for more effective solutions, both in terms of power consumption and PDP.

#### References

- [1] P. P. Gelsinger, Microprosessors for the new millennium; Challenges, Opportunities, and New Frontiers, Digest of technical Papers, IEEE Symposium on Circuits and Systems pp22–25, 2001.
- [2] H. Soeleman, K. Roy, Bipul C. Paul, Robust Subthreshold Logic for Ultra-Low Power Operation *IEEE Trans. on VLSI*, V 9, pp 90-99, Feb. 2001.
  [3] J. Rabaey, M. Pedram, P. Landman, in *Low Power*
- [3] J. Rabaey, M. Pedram, P. Landman, in Low Power Design Methodologies Kluwer Academic Publishers, Boston, 1995.
- [4] S. Aunet, *Kretselement* Norwegian patent application number 20035537, filed December 11,2003.
- [5] S. Aunet, B. Oelmann, S. Abdalla, Y. Berg Reconfigurable Subthreshold CMOS Perceptron Proceedings of the IEEE Int'l. Joint Conference on Neural Networks, Budapest, Hungary, July 2004.
- [6] V. Beiu, J. M. Quintana, and M. J. Avedillo, VLSI Implementations of Threshold Logic - a Comprehensive Survey, *IEEE Trans. on Neural Networks*, V 14, pp. 1217–1243, Sept. 2003.
- [7] A. Bryant, J. Brown, P. Cottrell, M. Ketchen, J. Ellis-Monaghan, E. J. Nowak Device Research Conference Notre Dame, USA, June 2001.
- [8] A. G. Andreou, K. A. Boahen, P. O. Pouliquen, A. Pavasovic, R. E. Jenkins, K. Strohben, Current-Mode Subthreshold MOS Circuits for Analog VLSI Neural Systems *IEEE Trans. on Neural Networks*, Vol. 2, pp. 205–213, March, 1991.
- [9] O. Lein Ny Ultra lav-effekt FULL-ADDER basert på CMOS Perceptron Project work for the subject TDT4720 Datamaskinkonstruksjon og -arkitektur, fordypningsemne Norwegian University of Science and Technology, Norway, May 2004.
- [10] A. M. Shams, M. A. Bayoumi A Framework for Fair Performance of 1-bit Full Adder Cells Proceedings of the 42nd Midwest Symposium on Circuits and Systems Vol. 1, pp 6-9, Las Cruces, NM, USA, August 1999.
- , Vol. 1, pp 6-9, Las Cruces, NM, USA, August 1999.
  [11] G. Tesanovic Performance Analysis and Implementation of Full Adder Cells Using 0.18 μm CMOS Technology Bachelor's thesis, ISY, Linkøping University, Sweden, December 2003.
- [12] S. Aunet, Y. Berg, T. Sæther Real-time Reconfigurable Linear Threshold Elements Implemented in Floating-Gate CMOS, IEEE Trans. on Neural Networks, V 14, pp. 1244–1256, Sept. 2003.