

# An 800-MHz Mixed- $V_T$ 4T IFGC Embedded DRAM in 28-nm CMOS Bulk Process for Approximate Storage Applications

Robert Giterman<sup>1</sup><sup>D</sup>, Alexander Fish, Member, IEEE, Narkis Geuli, Elad Mentovich, Andreas Burg, Member, IEEE, and Adam Teman

**Abstract**—Gain-cell embedded DRAM (GC-eDRAM) is an attractive alternative to traditional static random access memory (SRAM) due to its high-density, low-leakage, and inherent two-ported operation, yet its dynamic nature leads to limited retention time and calls for periodic, power-hungry refresh cycles. This drawback is further aggravated in scaled technologies, where increased leakage currents and decreased in-cell storage capacitances lead to accelerated data integrity deterioration. The emerging approximate computing paradigm utilizes the inherent error-resilience of different applications to tolerate some errors in the stored data. Such error tolerance can be exploited to reduce the refresh rate in GC-eDRAM to achieve a substantial decrease in power consumption at the cost of an increase in cell failure probability. In this paper, we present the first fabricated and fully functional GC-eDRAM in a 28-nm bulk CMOS technology. The array, which is based on a novel mixed- $V_T$  four-transistor (4T) gain cell with internal feedback (IFGC) optimized for high performance, features a small silicon footprint and supports high-performance operation. The proposed memory can be used with conservative (i.e., 100% reliable) computing paradigms, but also in the context of approximate computing, featuring a small silicon footprint and random access bandwidth. Silicon measurements demonstrate successful operation at 800 MHz under a 900-mV supply while retaining between 30% and 45% lower bitcell area than a single-ported six-transistor (6T) SRAM and a two-ported six-transistor (8T) SRAM in the same technology.

**Index Terms**—Approximate computing, gain cell, gain cell with internal feedback (IFGC), gain-cell embedded DRAM (GC-eDRAM), logic-compatible eDRAM, low power, static random access memory (SRAM).

## I. INTRODUCTION

DESPITE scaling and the increasing number of transistors on silicon dies, the real estate of many VLSI systems-on-chip (SoCs) is often dominated by the area of embedded

Manuscript received November 29, 2017; revised February 15, 2018 and March 18, 2018; accepted March 18, 2018. Date of publication May 8, 2018; date of current version June 25, 2018. This paper was approved by Guest Editor Shidhartha Das. This work was supported by the HiPer Consortium through the Israeli Innovation Authority. (*Corresponding author: Robert Giterman.*)

R. Giterman, A. Fish, and A. Teman are with Emerging Nanoscaled Integrated Circuits and Systems Labs, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: robert.giterman@biu.ac.il; alexander.fish@biu.ac.il; adam.teman@biu.ac.il).

N. Geuli and E. Mentovich are with Digital Backend Team, Mellanox Technologies, Yokne'am Illit 2069200, Israel.

A. Burg is with the Telecommunications Circuits Laboratory, Institute of Electrical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland (e-mail: andreas.burg@epfl.ch).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2018.2820145

memories [1]. While aggressive scaling of device dimensions in deeply scaled technologies enables the integration of more transistors in these systems, it also results in a subsequent increase in parametric variations. As a result, the design of energy-efficient embedded memories has introduced new challenges, as the variations in device characteristics result in reduced noise and timing margins of memory cells, which leads to an increased number of failures [2]. In order to ensure reliable, error-free operation under process scaling, memories are generally designed with sufficient margins, which come at the cost of significant overhead in energy, area, and timing [3], [4].

Approximate computing is an emerging design paradigm that exploits the inherent resilience to errors of many applications to relax the conservative requirement for 100% reliable hardware [4], [5]. In this approach, the common, worst case design methodology that is generally applied to avoid any potential error at all cost can be relaxed for systems that allow for some limited quality degradation [6]–[8]. This design paradigm is especially suitable for embedded memories, which are often the first point of failure, when the need for high density and low power consumption (i.e., low voltage operation) leads to conflicting requirements. The freedom to tolerate errors partially resolves this conflict by trading quality for density and/or power consumption. However, in order to unleash the full potential of this approach, memory circuits must be equipped with a knob that allows to adjust this tradeoff easily and effectively (i.e., without significant overhead) at design time or even during run time. Embedded DRAMs (eDRAMs) offer a variety of such knobs. At design time, a plethora of different bit-cell topologies are available, which offer tradeoffs between density and retention time [9]–[13]. At run time, the refresh rate can easily be adjusted to trade power and access bandwidth for reliability [12]–[14].

CMOS compatible gain-cell eDRAM (GC-eDRAM) has recently emerged as an interesting alternative to static random access memory (SRAM) due to its high-density, non-destructive read operation, low leakage power, and two-port operation [9]–[13]. However, this technology requires periodic refresh cycles to reliably retain data, which reduces both memory availability and access bandwidth, and consumes dynamic refresh power. While GC-eDRAM implementations in mature technology nodes, such as 90 and 65 nm, provide long data retention times (DRTs), sub-65-nm technologies



Fig. 1. (a) Conventional 2T. (b) 2TAsy. (c) Low- $V_T$  3T. (d) Proposed 4T IFGC.

suffer from much shorter DRTs due to the reduced parasitic storage capacitances and increased leakage currents [9], [10], [13], [15], [16]. Furthermore, while GC-eDRAM often provides higher density and lower retention power than SRAM, it is usually outperformed by SRAM in terms of access speed due to the slower, single-ended readout mechanism of GC-eDRAM, which is degraded further due to a voltage decay on their storage nodes (SNs) prior to a refresh operation. Therefore, the application of GC-eDRAM implementations has often been limited to systems where high frequency operation is of lower priority [11], [17]–[20].

In this paper, we present, for the first time, a GC-eDRAM array implementation in a 28-nm *bulk* CMOS technology.<sup>1</sup> With a conservative refresh rate, this memory can be used for conventional applications, but it is also especially well suited for applications that tolerate a certain percentage of errors in its embedded storage. By using a novel, mixed- $V_T$  four-transistor (4T) gain cell with internal feedback, denoted as a 4T IFGC, this array achieves a significantly higher DRT than other GC-eDRAM topologies in the same technology node, and it is operated at 800 MHz, which is higher than any other reported GC-eDRAM implementation. This performance is achieved with a 30%–45% area advantage and significant power savings as compared with the single-ported six-transistor (6T) SRAM and two-ported eight-transistor (8T) SRAM cells in the same technology.

#### A. Contributions

The main contributions of this paper are summarized as follows.

- 1) This paper presents the first fabricated GC-eDRAM memory in a *bulk* CMOS technology node *below* 65 nm.
- 2) The array is based on a novel, mixed- $V_T$  4T IFGC topology, enabling a much higher operating frequency than previous gain-cell topologies. The resulting memory is the fastest GC-eDRAM array reported in the literature, reaching 800 MHz.
- 3) We consider the use of GC eDRAM as storage in the context of approximate computing. In particular, we characterize and discuss the energy/quality tradeoff based on actual measurements, rather than on abstract models.

<sup>1</sup>This paper was presented for the first time at the ESSCIRC 2017 Conference. This paper is an invited and extended version of the original manuscript.

#### B. Outline

The rest of this paper is organized as follows. Section II provides an analysis of the retention time characteristics of different established gain-cell topologies under process scaling. Section III describes the proposed mixed- $V_T$  4T IFGC for high-performance operation. Section IV describes the implementation of the complete memory macro. Section V shows the test-chip integration of the memory macro, presents the measurement results, and provides a comparison to other embedded memory options. Section VI describes the operation and design tradeoffs of the proposed memory macro in approximate storage applications. Section VII concludes this paper.

## II. GAIN-CELL DESIGN TOPOLOGIES UNDER PROCESS SCALING

The conventional and smallest type of GC-eDRAM is based on a two-transistor (2T) bitcell, as shown in Fig. 1(a) [9]. This circuit is composed of a write port, including a write transistor (NW), a write word-line (WWL), and a write bit-line (WBL); a read port, including a read transistor (NR), a read word-line (RWL), and a read bit-line (RBL); and a parasitic SN capacitance, mainly composed of the gate capacitance of NR.

In general, pMOS devices have lower gate leakage than nMOS [9], [15], and therefore, they are often the preferred choice for the implementation of the NW in order to reduce the leakage from SN [17]. On the other hand, nMOS devices usually have a higher  $I_{ON}$  current [15], [17], and therefore, they are often used for the implementation of a NR to improve the read access speed. On that basis, an asymmetric 2T (2TAsy) gain cell, shown in Fig. 1(b), was proposed in [15], composed of a regular- $V_T$  pMOS device for write and a low- $V_T$  nMOS device for read.

While the 2T implementation offers the smallest area, a third transistor can be added to the read port in order to avoid RBL drift toward  $V_{DD}$  during the readout of a stored “1,” which is caused due to leakage by other cells in the same column [11]. An example of a 3T gain cell is shown in Fig. 1(c) [10], implemented with three low- $V_T$  nMOS devices for improved write and read access times, at the cost of a lower DRT due to higher leakage through NW.

While the conventional GC-eDRAM bitcells demonstrated sufficient DRTs in mature technology nodes, they suffer from much lower DRTs in deeply scaled nodes (below 65 nm). This can primarily be attributed to reduced parasitic



Fig. 2. (a) Sub-threshold leakage across corners at 180-, 65-, and 28-nm technologies. (b) EDRT cumulative distribution functions of a conventional 2T, 2TAsy, and 3T gain cells.

storage capacitances (smaller gates and low- $k$  interlayer dielectrics) and increased leakage currents (shorter channels and lower  $V_T$ ) [11], [12], [17]. The main leakage mechanism that degrades the data stored in the SN is sub-threshold conduction from SN to WBL. This leakage component has increased by over two orders-of-magnitude with process scaling from 180- to 28-nm CMOS nodes, as shown in Fig. 2(a) for fast, typical, and slow corners at 180-, 65-, and 28-nm CMOS technologies for a regular- $V_T$  nMOS device. As a result of this phenomenon, the effective DRT (EDRT) [21] of these bitcells has substantially decreased with technology scaling. Fig. 2(b) shows the cumulative distribution functions of the EDRT for the above-described 2T, 2TAsy, and 3T gain-cell topologies under technology scaling, from 180- down to 28-nm technologies. This quantitative analysis clearly shows that the EDRT has decreased by over three orders-of-magnitude with process scaling. Interestingly, the 2TAsy cell, which had the highest EDRT among the simulated cells at 180 and 65 nm, has a lower EDRT than the 2T cell at the 28-nm node, due to increased reverse-diode leakage and the smaller difference

between pMOS and nMOS conduction currents at this technology node.

### III. PROPOSED MIXED- $V_T$ 4T IFGC FOR HIGH-PERFORMANCE OPERATION

In order to enable GC-eDRAM scaling to beyond 65-nm technologies despite the increased leakage currents, a 4T GC-eDRAM bitcell with internal feedback (IFGC) can be used [11], [16]. The 4T IFGC exploits the asymmetrical data deterioration of “1” and “0” in conventional 2T bitcells [11], [15] to extend the DRT of the weaker data level using internal feedback, formed by two additional devices, NB and feedback transistor (NF).

#### A. 4T IFGC Considerations for High-Performance Operation

The IFGC can be implemented with either pMOS or nMOS devices with different  $V_T$  options, depending on the application. The implementation in [11], which targeted ultralow power operation, proposed using an all regular- $V_T$  structure. However, this implementation suffers from highly decreased read access times due to the slow discharge of the large RBL capacitance during readout, limiting its implementation to low frequency operation. The discharge current of the RBL capacitance is further a function of the overdrive voltage of NR ( $V_{GS,NR}$ ), which is equal to the difference between the SN voltage and the threshold voltage of NR ( $V_{T,NR}$ ) during readout ( $V_{SN} - V_{T,NR}$ ).  $V_{T,NR}$  can be reduced by selecting a low- $V_T$  transistor for the implementation of NR, and a higher refresh frequency can be applied in order to increase  $V_{SN}$  during the readout of a logic-“1.” However, this would result in higher refresh power consumption and lower memory availability. Alternatively, the voltage degradation of logic-“1” can be reduced by strengthening the internal feedback of the 4T IFGC. The internal feedback of the cell depends on the leakage suppression from SN using NF, which charges the internal buffer node (BN) to  $V_{DD}$  when storing a “1.” Therefore, implementing NF using a low- $V_T$  device would increase the leakage suppression from SN and slows down the deterioration of a logic-“1” state.

In order to evaluate the read access time for different transistor types, the RBL charge/discharge speed was simulated for cells storing “1” and “0” with nMOS and pMOS transistors, respectively. The different transistor types were used to implement both the NR and NF. The read operation was performed following a 1- $\mu$ s retention period. The simulations included the parasitic capacitances extracted from the layout of the 4T IFGC in a 28-nm CMOS technology for a 128-row memory array. The read access speed was calculated based on the time that was required to charge/discharge the RBL by 400 mV from its initial state ( $V_{DD}$  for nMOS and GND for pMOS). The results are plotted in Fig. 3, showing the RBL voltages for regular- $V_T$  and low- $V_T$  pMOS and nMOS NRs. The results show that an nMOS low- $V_T$  implementation offers the fastest read access speed with the RBL discharging from 0.9 to 0.5 V after 673 ps, while the pMOS low- $V_T$  implementation resulted in the RBL charging from 0 to 0.4 V only after 3874 ps, indicating that an nMOS LVT implementation for NR and NF would be the best choice for the 4T IFGC.



Fig. 3. RBL transient waveform during a read operation for a 128-row 4T IFGC array, after a 1- $\mu$ s data retention period without refresh for different transistor types.



Fig. 4. 4T IFGC operating mechanism.

#### B. Proposed Mixed-V<sub>T</sub> 4T IFGC

Based on the conclusions of Section III-A, the proposed mixed-V<sub>T</sub> all-nMOS 4T bitcell is presented in Fig. 1(d). The corresponding circuit features low-V<sub>T</sub> devices implementing the NF and NR to enable faster evaluation of the RBL voltage during read, which results in fast read access time. Moreover,



Fig. 5. (a) Data degradation curves of “1” (blue) and “0” (red) stored in the 4T IFGC under worst case biasing conditions. (b) Estimated retention time distributions of a conventional 2T gain cell and the proposed mixed-V<sub>T</sub> 4T IFGC.

the low-V<sub>T</sub> NF transistor strengthens the internal feedback of the cell, resulting in improved DRT characteristics.

The signal waveforms for read and write operations of “1” and “0” in a 4T IFGC are provided in Fig. 4. Write is performed by charging WWL to a boosted voltage ( $V_{BOOST}$ ) to avoid the  $V_T$ -drop when passing a “1” on the WBL through an nMOS device to the SN. Read is performed by precharging RBL to  $V_{DD}$  and then driving RWL to GND. If the cell stores a “1,” RBL is discharged, while it remains charged if the cell stores a “0.”

Fig. 5(a) shows the data deterioration of “1” (blue) and “0” (red) following a write operation under worst case biasing conditions. While the internal feedback path partially compensates the sub- $V_T$  leakage from SN when storing “1,” this state still deteriorates faster than the “0” state due to the diffusion leakage from the bulk, which discharges SN. In fact, the retention time of “0” remains high despite the feedback due to the additional gate capacitance of NF and the stacking of NW and NB, which reduce the sub- $V_T$  leakage from WBL.



Fig. 6. Layout views of (a) 4T IFGC and (b) 6T SRAM bitcells, and (c) 128 × 32 4T IFGC memory macro in a 28-nm technology.

The increased DRT and faster read access provided by the mixed- $V_T$  4T IFGC enable the continued integration of GC-eDRAM technology in light of process scaling. This is demonstrated in Fig. 5(b), which shows the EDRT distributions of a conventional 2T gain cell in 65- and 28-nm technologies, compared with the EDRT of the proposed mixed- $V_T$  4T IFGC in 28 nm. The results clearly indicate that the degraded EDRT of the 2T cell in 28 nm is recovered by the implementation of the mixed- $V_T$  4T IFGC, offering a two orders-of-magnitude higher EDRT in a 28-nm bulk CMOS technology. The distributions were extracted from 1000 Monte Carlo (MC) statistical simulations, including mismatch and global variations under worst case biasing conditions, where WBL is driven to the opposite level of SN, resulting in the highest leakage from SN.

#### IV. 4T IFGC MEMORY MACRO IMPLEMENTATION

##### A. Layout

A 4-kbit (128 × 32) memory macro based on the proposed mixed- $V_T$  4T IFGC was implemented in a 28-nm bulk CMOS technology. The layout view of the 4T IFGC is shown in Fig. 6(a), featuring the WBL and RBL lines routed in M2 (horizontal), RWL and  $V_{DD}$  lines routed in M3 (vertical), and WWL routed in poly for reduced parasitic capacitance. Minimum-sized devices were used for the implementation of the 4T IFGC, composed of two low- $V_T$  transistors (NF and NR) and two regular- $V_T$  transistors

(NW and NB). The 4T IFGC is measured at 0.4 μm × 0.63 μm (0.252 μm²). Fig. 6(a) shows the layout of a 6T SRAM cell, drawn with standard design rules in the same technology node. This circuit is 0.26 μm × 1.3 μm (0.338 μm²), which is 34% larger than the proposed 4T IFGC. A similar layout style can be used in more advanced technology nodes, which employ FinFET-based devices. In these technologies, we expect the area advantage of the 4T IFGC over SRAM to decrease due to the increased area penalty incurred by the minimum distance between unshared transistor diffusions.

The layout of the 4-kbit 4T IFGC memory macro is shown in Fig. 6(c), with a silicon footprint of 31 μm × 65 μm. For comparison, a similar-sized, single-port SRAM macro, with “pushed-rule” bitcell layout, has a total silicon footprint of 42 μm × 68 μm, which is over 30% larger than the proposed macro. Note that, based on the target DRT, the read delay of the 4T IFGC array can be higher than the read delay of the 6T SRAM array. This is due to the degraded voltage on the SN as well as the longer vertical dimension of the bitcell, which increases the RBL capacitance compared with the BL capacitance of the 6T SRAM cell. Under ISO read conditions,<sup>2</sup> the width of the NR of the 4T IFGC cell needs

<sup>2</sup>An ISO read condition has been defined as even RBL and BL delays of the 4T IFGC and 6T SRAM arrays of the same size during a read operation. The read of the 4T IFGC was performed following a 5-μs retention period for data “1,” under worst case biasing conditions.



Fig. 7. (a) Differential sense amplifier. (b) Programmable SAE generation. (c) Waveform demonstration. (d) RBL sensing delay for the differential sense amplifier and a CMOS sense inverter.

to be upsized by 80% in order to reduce the read delay to match that of a 6T SRAM array. As a result, the macro area savings are reduced to 26%. It is worth mentioning that the 4T IFGC macro contains two (write and read) row decoders, WWL level shifters, sense amplifiers, and all necessary BL and WL drivers.

#### B. Differential Sense Amplifier for High-Speed Read Access

High-density GC-eDRAM implementations often contain a simple inverter to sense the voltage of the RBL during readout, offering a low-area solution for single-ended readout [13], [18]. However, this sensing scheme is insufficient for high-speed memories, as it requires the large RBL capacitance to discharge past the switching threshold of this sense inverter. Furthermore, leakage from unselected cells in the column sharing the same RBL often slows down the RBL discharge process and causes it to saturate before reaching the switching threshold of the inverter, which results in read failures [10], [15], [18].

In order to support high-speed read access, a voltage-based differential sense amplifier with an external reference voltage ( $V_{\text{REF}}$ ) was implemented to enable faster evaluation of the RBL voltage. The schematic of the proposed sense

amplifier is shown in Fig. 7(a), accompanied by a simulated waveform illustration in Fig. 7(c).

Prior to a readout operation, RBL is precharged to  $V_{\text{DD}}$  using the control signal PC. In addition, the sense amplifier enable (SAE) signal is low, disabling the discharge path through transistor NPD and precharging OUT and OUTB to  $V_{\text{DD}}$ . The SAE signal must be charged only when a voltage difference, beyond the internal offset of the sense amplifier, has been developed between RBL and  $V_{\text{REF}}$ . The SAE signal was implemented with a programmable delay line to enable adjustment according to the selected memory refresh rate and operating frequency, as shown in Fig. 7(b).

Fig. 7(d) shows the simulated RBL sensing delay distribution for the differential sense amplifier with  $V_{\text{REF}}$  set to 800 mV and for a single-ended reference sense inverter with a skewed switching threshold. Each distribution represents the delay variation of a 128-row 4T IFGC array, with a given SN voltage level that corresponds to the mean value of “1” after a 5- $\mu\text{s}$  retention period. The simulations were made under worst case biasing conditions, in which all 127 unselected cells in the column hold a strong “1” level, resulting in the maximum leakage from unselected RWLs to the RBL. The simulations also included the parasitic components, which were extracted



Fig. 8. (a) Level-shifting WWL drivers. (b) Timing diagram.

from the layout of the array, as well as process variations, which affect the internal offset of the sense amplifier and the discharge current of the RBL. The results clearly indicate that a differential sense amplifier yields an almost three times faster RBL sensing delay than a skewed sense inverter, with an average value of 269 ps.

### C. Level-Shifting WWL Drivers

Recall that the write operation requires driving the selected WWL to a boosted voltage in order to overcome the  $V_T$ -drop when writing a “1” through an nMOS transistor. To achieve this with small area and power overhead, we employ a single global level shifter that provides the boosted voltage to the write post-decoder during write operations and a low area, per-row, level-shifting WWL driver to drive  $V_{BOOST}$  to the selected row for writing. For this test chip,  $V_{BOOST}$  has been generated off-chip and supplied to the memory macro using an analog I/O pad. Fig. 8(a) shows the schematics of the level-shifting WWL driver, together with a simulated waveform, as shown

in Fig. 8(b). During the high phase of the clock, the gated write enable (WEG) signal is disabled in order to reset all the WWLs during the evaluation delay of the decoder. During the low phase of the clock, the global level shifter, shown inside the dashed line in Fig. 8(a), generates the shifted write enable (WES) signal, which cuts off the pMOS pull-up (PU) devices of every WWL driver. The nMOS pull-down device of the WWL driver is controlled by the decoder enable (DE) signal, generated at the output of a NOR gate, driven by the decoder output address (DecOut) and the inverted write enable (WEN) signal.

## V. MEASUREMENT RESULTS AND COMPARISON

### A. Measurement Results

A micrograph of the 28-nm 4-kbit ( $128 \times 32$ ) memory test chip is shown in Fig. 9 together with the key characteristics of the implemented mixed- $V_T$  4T IFGC memory macro. The test chip comprises a test setup, including a PLL for high-speed testing, direct memory access mode for debug, and an on-chip built-in self-test (BIST) for full memory characterization. An I<sup>2</sup>C interface was used to configure the operating mode of the test setup.

The test chips were packaged and connected to a Xilinx evaluation board to enable test control with the standard FPGA. The array was tested with both predefined tests, using the memory BIST, as well as with specific tests programmed into the FPGA and propagated through the I/O pads of the chip. All 11 packaged chips were fully operational across the complete range of supply voltages from 0.6 to 1 V, which were the minimum and maximum voltages that were supported by the test setup. At the nominal technology supply voltage of 0.9 V, the memory was successfully operated across a frequency range from 100 to 800 MHz at room temperature, which is, to the best of our knowledge, the fastest operating GC-eDRAM array reported in the literature.

Fig. 10 shows the measured retention time maps of the 4-kbit mixed- $V_T$  4T IFGC array, operated with an 800-MHz clock frequency and a 0.9-V supply voltage at temperatures of 0, 27, and 85 °C.  $V_{BOOST}$  and  $V_{REF}$  were set to 1.2 and 0.8 V, respectively. The retention time maps were extracted by writing a selected data pattern to the array, followed by a per-configured idle time. Then, the array was read out, and the output data were compared with the written data. The idle time period between write and read operations was swept to extract the retention time of every bit in the array. The results reflect the lower DRT of the two levels for each cell.

The average DRT was found to be 279, 264, and 250  $\mu$ s for 0, 27, and 85 °C, respectively. The DRT decreases with temperature due to the increase in leakage currents, dominated by sub- $V_T$  conduction and reverse-biased diode leakage, which exponentially depend on the temperature [22]. In addition, the ON current through NR during a readout of “1” is decreased with temperature due to lower mobility, reducing the DRT of “1,” since the RBL discharges slowly. Nevertheless, the internal feedback incorporated in the 4T IFGC cell reduces the influence of increased sub- $V_T$  leakage of NW on the value



Fig. 9. Die photograph and key features of the 4T IFGC test chip.



Fig. 10. DRT maps at (a) 0, (b) 27, and (c) 85 °C.

stored on the SN, since the sub- $V_T$  feedback current through NF increases with increasing temperature. While the average DRT of the array at 85 °C is 250  $\mu$ s, the DRT distribution is spread across three orders-of-magnitude, with an observed worst case DRT of as low as 1  $\mu$ s. This is a direct result of the mismatch between transistors in the memory array. The variation in the threshold voltage of nMOS devices not only affects the leakage, which degrades the voltage on the SN, but also affects the RBL discharge rate when reading a “1,” adding to the total variation of the DRT for a given maximum read access delay.

To demonstrate the effect of global die-to-die variations on the DRT of the 4T IFGC array, Fig. 11(a) shows the percentage of DRT failures of 11 measured chips as a function of the time

after write at 85 °C. Fig. 11(b) shows the cumulative distribution function of the DRT across temperatures, as extracted from the measured chips. The results indicate that the 4T IFGC array can be operated with a refresh rate of 5  $\mu$ s with 99% of the bits providing reliable data retention. For a memory macro with 128 rows operated at a clock frequency of 800 MHz, this refresh rate corresponds to 93% memory availability for write and read operations, despite refresh cycles. The retention power, which is the sum of the leakage and refresh power of the 4-kbit memory macro for a 5- $\mu$ s refresh period, was found to be 40, 43, and 45  $\mu$ W for 0, 27, and 85 °C, respectively. The refresh power constitutes between 78% and 80% of the total retention power. To evaluate the DRT for higher sigmas, a power transformation was applied to the



(a)



(b)

Fig. 11. (a) Percentage of DRT failures of 11 measured chips. (b) Cumulative distribution function of the DRT across temperatures.



Fig. 12. DRT extrapolation across a six-sigma range for temperatures of 0, 27, and 85 °C.

distribution of Fig. 10 to extrapolate the six-sigma average DRT [11], as shown in Fig. 12. The six-sigma DRT was found at 182, 175, and 123  $\mu$ s at 85, 27, and 0 °C, respectively.



Fig. 13. DRT as a function of the operating frequency.

Fig. 13 shows the DRT of the arrays when operated under different clock frequencies for both 100% and 99% bit yields (percentage of reliable cells). The DRT decreases with frequency due to a shorter read cycle, which results in a limited RBL swing when the cell stores a degraded “1” voltage. The measured DRT is increased by over two times when the operating frequency is reduced from 800 to 100 MHz. This point is further discussed in Section VI.

### B. Comparison

A comparison between the proposed mixed- $V_T$  4T IFGC and other logic-compatible embedded memory bitcells is provided in Table I. The bitcells under consideration include an 2TAsy gain cell [15], a conventional 2T gain cell [9], a 3T gain cell [10], and a 6T SRAM cell. These gain-cell implementations are compared based on both their original implementation technology, as well as on simulated results in the same 28-nm node. While the 2T and 3T gain-cell structures provide lower area than the proposed 4T IFGC, their DRT in the 28-nm node is severely degraded to below 100 ns. On the other hand, the proposed 4T IFGC has a 5- $\mu$ s DRT at 85 °C while also providing the highest operating speed compared with other gain-cell implementations. The proposed cell has between 30% and 45% lower cell area, as compared with a redrawn single-port and two-port SRAM cells with logic design rules, respectively, as well as a 30% lower macro area compared with a single-port SRAM macro implemented with the “pushed-rule” layout. Moreover, the proposed two-ported 4T IFGC consumes over 50 times less bitcell leakage power and over 5 times less retention power (under a 5- $\mu$ s refresh period), compared with the leakage power of the 6T and 8T SRAM cells in the same technology.

## VI. 4T IFGC AS AN APPROXIMATE MEMORY

The inherent error tolerance of many applications, such as algorithms in the fields of communications and signal processing, paved the way to a new design paradigm, where nonzero error probabilities on the data can often be tolerated [23]. In the context of memories, the concept of allowing a small

TABLE I  
COMPARISON BETWEEN THE PROPOSED DESIGN AND OTHER LOGIC-COMPATIBLE MEMORY OPTIONS

|                                           | Single-Port 6T SRAM         | 2-Port 8T SRAM       | Asymmetric 2T Gain-Cell [15] | Conventional 2T Gain-Cell [9] | 3T Low-V <sub>t</sub> Gain-Cell [10] | Proposed Mixed-V <sub>t</sub> 4T IFGC |
|-------------------------------------------|-----------------------------|----------------------|------------------------------|-------------------------------|--------------------------------------|---------------------------------------|
| Cell Structure                            |                             |                      |                              |                               |                                      |                                       |
| Technology Node                           | 28nm                        | 28nm                 | 65nm                         | 28nm                          | 65nm                                 | 28nm                                  |
| Cell Size ( $\mu\text{m}^2$ )*            | 0.34 $\mu\text{m}^2$        | 0.44 $\mu\text{m}^2$ | 0.63 $\mu\text{m}^2$         | 0.17 $\mu\text{m}^2$          | 0.627 $\mu\text{m}^2$                | 0.16 $\mu\text{m}^2$                  |
| Cell Size ( $F^2$ )*                      | 431 $F^2$ [1X]              | 560 $F^2$ [1.3X]     | 150 $F^2$                    | 220 $F^2$ [0.51X]             | 149 $F^2$                            | 203 $F^2$ [0.47X]                     |
| 4Kbit Macro Size                          | 2856 $\mu\text{m}^2$ [1X]** | N/A                  | N/A                          | N/A                           | N/A                                  | 245 $F^2$ [0.73X]                     |
| Supply Voltage                            | 0.9V                        | 0.9V                 | 1.2V                         | 0.9V                          | 900mV                                | 0.9V                                  |
| Maximum Array Freq.                       | N/A                         | N/A                  | 667Mhz                       | N/A                           | 250Mhz                               | 800Mhz                                |
| Retention Time***                         | Static                      | Static               | 110 $\mu\text{s}$            | 36ns                          | 10 $\mu\text{s}$                     | 5 $\mu\text{s}$                       |
| Bit-cell Leakage Power                    | 47nW/bit                    | 58nW/bit             | N/A                          | 0.6nW/bit                     | 0.4nW/bit                            | 0.4nW/bit                             |
| Array Retention Power (Leakage + Refresh) | 192 $\mu\text{W}$           | 237 $\mu\text{W}$    | N/A                          | 6.25mW                        | 2.47mW                               | 15mW                                  |

\*Logic design rule

\*\*SRAM design rule

\*\*\*99% yield

\*\*\*\*All results are at 85C



Fig. 14. Power consumption of the 4T IFGC array across different refresh periods compared with a 6T SRAM.

number of errors to occur in order to reduce the power consumption has been suggested by several groups [8], [14], [24]. For GC-eDRAM arrays, the tradeoff between low power consumption and reduced data integrity can conveniently be adjusted (even at run time) by configuring the refresh rate of the array to fit within the system requirements.

#### A. Trading Reliability for Refresh Rate and Power

Recall that the refresh rate of GC-eDRAM is typically set according to the DRT of the worst cell in the array. This introduces significant overhead, as the average DRT of a GC-eDRAM array is a few orders-of-magnitude higher than the worst cell, as illustrated in Section V. Therefore, while the refresh rate of the implemented 4T IFGC array may be high for instruction or data caches, it can be set much lower with only a small impact on the error rate for storing data with relaxed reliability requirements, thereby providing significant power reduction.

To illustrate the possible power savings by increasing the refresh period, Fig. 14 shows the retention power consumption of the 4T IFGC array, which is composed of the leakage and refresh power components, as a function of the refresh period, and compares it with the leakage power consumption of a



Fig. 15. Number of retention time failures after a 50- $\mu\text{s}$  retention period across frequencies.

6T SRAM array. For short refresh periods, the refresh power components are responsible for over 80% of the total retention power consumption of the 4T IFGC array, and therefore, the retention power consumption decreases almost linearly with the refresh frequency. At higher refresh periods, where the leakage power is more dominant, the total retention power decreases much slower with the refresh period. On the other hand, the static power of a 6T SRAM array stays constant, as it is purely composed of the leakage of the array and peripherals.

While the 4T IFGC memory design was optimized for fast access speed, we can also relax the speed requirements for achieving a longer DRT to reduce the necessary refresh rate. This tradeoff knob is available, since the read access time heavily depends on the time that is required to discharge the large RBL capacitance when storing a “1” in the cell. With the SN voltage already degraded following a data retention period, the overdrive of the NR is decreased and RBL discharges slowly. Therefore, by relaxing the frequency constraints and allowing more time for the RBL to discharge, the refresh rate of the array can be extended. Fig. 15



Fig. 16. Demonstration of an image stored in the 4T IFGC array at refresh periods of (a) 1  $\mu$ s (PSNR = 40.93 dB), (b) 5  $\mu$ s (PSNR = 29.98 dB), (c) 10  $\mu$ s (PSNR = 23.76 dB), and (d) 50  $\mu$ s (PSNR = 18.4 dB). (e) Retention power savings versus PSNR of an image stored in the 4T IFGC array.

demonstrates this tradeoff by showing the number of measured DRT failures after a 50- $\mu$ s retention period, as extracted from operating the array at different frequencies, ranging from 100 to 800 MHz. As expected, the number of DRT failures increases with frequency, resulting in over two times more failures at 800 MHz compared with a 100-MHz operating frequency.

#### B. Image Quality Versus Retention Power Savings

To illustrate the benefits of exploiting the long DRT tail of the implemented 4T IFGC array to provide a tradeoff between reliability (error rate) and power consumption, we demonstrate the effect of the approximate storage on the image data, based on the measured DRT. A common metric to quantify image quality is the peak signal-to-noise ratio (PSNR) metric.

Considering an 8-bit value describing the luminescence of an image pixel, the PSNR is defined as

$$\text{PSNR} = 20 \log_{10} \frac{255}{\sqrt{\text{MSE}}} \quad (1)$$

where mse is the mean square error. For this analysis, the mse was obtained using MATLAB simulations.

This evaluation is shown in Fig. 16, demonstrating the errors in the picture stored in a set of specific instances of the array under different refresh rates. The patterns of these array instances were created by first choosing the DRT of each bit according to the measured cumulative distribution function of the DRT of the dies, as demonstrated in Fig. 11(b). Each bit with a DRT that is lower than the refresh rate was then assumed to be unreliable.

The selected refresh periods of this experiment ranged from 1 to 50  $\mu$ s, resulting in refresh power components of 179–3  $\mu$ W, respectively. Clearly, the image shows a larger number of erroneous pixels as the refresh period is extended, compromising accuracy for power savings and array availability. The tradeoff between the PSNR and possible refresh power savings is demonstrated in Fig. 16(e), showing the achieved image PSNR for different refresh rates ranging from 1 to 50  $\mu$ s. As expected, the PSNR decreases as the refresh period is increased due to a higher number of failing bits. On the other hand, the retention power is significantly reduced as the refresh period is increased, providing an interesting tradeoff between the image accuracy and the power consumption of the memory.

## VII. CONCLUSION

In this paper, we demonstrated a novel mixed- $V_T$  4T IFGC eDRAM, which employs an internal feedback mechanism to achieve high DRTs at deeply scaled technologies and a fast readout port for high-performance operation. A memory macro based on the 4T IFGC array and optimized for high speed is the first GC-eDRAM implementation in a 28-nm bulk process. The memory macro operates at a frequency of 800 MHz—higher than any other gain-cell implementation in the literature. Moreover, it provides almost 30% lower area and significant power savings as compared with a conventional, single-ported SRAM. The array can maintain a 99% bit yield, if it is operated with a refresh period of 5  $\mu$ s, thereby fitting the requirements of many approximate storage systems. The implementation of the manufactured memory as an approximate storage solution was demonstrated on the image data using the PSNR metric. The array was shown to provide significant refresh power savings while retaining a slightly degraded, but well-acceptable image quality of 30-dB PSNR.

Technology scaling to below 20-nm process nodes can be beneficial for the proposed GC-eDRAM technology, as enhanced control of the channel using fully depleted silicon on insulator or FinFET technologies can result in lower sub- $V_T$  leakage and higher DRT, offering an energy-efficient solution for embedded memory implementation. Moreover, “pushed-rules” layout of the GC-eDRAM bitcell will result in even higher area savings.

## ACKNOWLEDGMENT

The authors would like to thank Y. Weizman, Y. Strassberg, I. Hashai, K. Talisveyberg, O. Shalev, R. Kauffman, I. Braun, O. Eini, and R. Golman for their help in the design and measurements of the test chip.

## REFERENCES

- [1] ITRS. (2015). *International Technology Roadmap for Semiconductors—2015 Edition*. [Online]. Available: <http://www.itrs2.nets>
- [2] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 24, no. 12, pp. 1859–1880, Dec. 2005.
- [3] S. Bhunia and S. Mukhopadhyay, *Low-Power Variation-Tolerant Design in Nanometer Silicon*. New York, NY, USA: Springer, 2010.
- [4] P. Gupta *et al.*, "Underdesigned and opportunistic computing in presence of hardware variability," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 32, no. 1, pp. 8–23, Jan. 2013.
- [5] A. Sampson, J. Nelson, K. Strauss, and L. Ceze, "Approximate storage in solid-state memories," *ACM Trans. Comput. Syst.*, vol. 32, no. 3, p. 9, 2014.
- [6] F. Frustaci, M. Khayatzadeh, D. Blaauw, D. Sylvester, and M. Alioto, "A 32 kb SRAM for error-free and error-tolerant applications with dynamic energy-quality management in 28 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 244–245.
- [7] F. Frustaci, D. Blaauw, D. Sylvester, and M. Alioto, "Approximate SRAMs with dynamic energy-quality management," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 6, pp. 2128–2141, Jun. 2016.
- [8] A. Kazimirsky, A. Teman, N. Edri, and A. Fish, "A 0.65-V, 500-MHz integrated dynamic and static RAM for error tolerant applications," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 9, pp. 2411–2418, Sep. 2017.
- [9] D. Somasekhar *et al.*, "2 GHz 2 Mb 2T gain cell memory macro with 128 GBytes/sec bandwidth in a 65 nm logic process technology," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 174–185, Jan. 2009.
- [10] Y. S. Park, D. Blaauw, D. Sylvester, and Z. Zhang, "Low-power high-throughput LDPC decoder using non-refresh embedded DRAM," *IEEE J. Solid-State Circuits*, vol. 49, no. 3, pp. 783–794, Mar. 2014.
- [11] R. Giterman, A. Fish, A. Burg, and A. Teman, "A 4-transistor nMOS-only logic-compatible gain-cell embedded DRAM with over 1.6-ms retention time at 700 mV in 28-nm FD-SOI," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 4, pp. 1245–1256, Apr. 2017.
- [12] R. Giterman, A. Fish, N. Geuli, E. Mentovich, A. Burg, and A. Teman, "An 800 MHz mixed-V<sub>T</sub> 4T gain-cell embedded DRAM in 28 nm CMOS bulk process for approximate computing applications," in *Proc. Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2017, pp. 308–311.
- [13] P. Meinerzhagen, A. Teman, R. Giterman, N. Edri, A. Burg, and A. Fish, *Gain-Cell Embedded DRAMs for Low-Power VLSI Systems-on-Chip*. Basel, Switzerland: Springer, 2018.
- [14] S. Ganapathy, A. Teman, R. Giterman, A. Burg, and G. Karakonstantis, "Approximate computing with unreliable dynamic memories," in *Proc. IEEE 13th Int. New Circuits Syst. Conf. (NEWCAS)*, Jun. 2015, pp. 1–4.
- [15] K. C. Chun, P. Jain, T.-H. Kim, and C. H. Kim, "A 667 MHz logic-compatible embedded DRAM featuring an asymmetric 2T gain cell for high speed on-die caches," *IEEE J. Solid-State Circuits*, vol. 47, no. 2, pp. 547–559, Feb. 2012.
- [16] R. Giterman, A. Teman, P. Meinerzhagen, A. Burg, and A. Fish, "4T gain-cell with internal-feedback for ultra-low retention power at scaled CMOS nodes," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Jun. 2014, pp. 2177–2180.
- [17] P. Meinerzhagen, A. Teman, R. Giterman, A. Burg, and A. Fish, "Exploration of sub-V<sub>T</sub> and near-V<sub>T</sub> 2T gain-cell memories for ultra-low power applications under technology scaling," *J. Low Power Electron. Appl.*, vol. 3, no. 2, pp. 54–72, 2013.
- [18] R. Giterman, A. Teman, P. Meinerzhagen, L. Atias, A. Burg, and A. Fish, "Single-supply 3T gain-cell for low-voltage low-power applications," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 1, pp. 358–362, Jan. 2015.
- [19] R. Giterman, L. Atias, and A. Teman, "Area and energy-efficient complementary dual-modular redundancy dynamic memory for space applications," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 2, pp. 502–509, Feb. 2017.
- [20] R. Giterman, A. Teman, and P. Meinerzhagen, "Hybrid GC-edram/sram bitcell for robust low-power operation," *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. 64, no. 12, pp. 1362–1366, Dec. 2017.
- [21] N. Edri, P. Meinerzhagen, A. Teman, A. Burg, and A. Fish, "Silicon-proven per-cell retention time distribution model of gain-cell based eDRAM," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 2, pp. 222–232, Feb. 2016.
- [22] C. C. Enz and E. A. Vittoz, *Charge-Based MOS Transistor Modeling: The EKV Model for Low-Power and RF IC design*. Hoboken, NJ, USA: Wiley, 2006.
- [23] S. Mittal, "A survey of techniques for approximate computing," *ACM Comput. Surv.*, vol. 48, no. 4, pp. 62–1–62–33, Mar. 2016.
- [24] I. J. Chang, D. Mohapatra, and K. Roy, "A priority-based 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications," *IEEE Trans. Circuits Syst. Video Technol.*, vol. 21, no. 2, pp. 101–112, Feb. 2011.



**Robert Giterman** received the B.Sc. degree in electrical engineering from the Ben-Gurion University of the Negev, Beersheba, Israel, in 2013, and the M.Sc. degree from the Ben-Gurion University of the Negev in 2014 as part of a Fast Track Program for outstanding students. He is currently pursuing the Ph.D. degree as part of Emerging Nanoscaled Integrated Circuits and Systems Labs with Bar-Ilan University, Ramat Gan, Israel, under the supervision of Prof. A. Fish.

He has authored/co-authored 15 journal articles and international conference papers and holds three patent applications. He has presented his research at a number of international conferences. He is also the co-author of the book *Gain-Cell Embedded DRAMs for Low-Power VLSI Systems-on-Chip*. His current research interests include the embedded DRAM design and optimization for low-power and high-performance operation, the SRAM design with an emphasis on improved stability, error-correction and fault-tolerant circuits, and the development of hardware-security oriented embedded memories for use in low-power applications and high-end processors. As part of his research, he led several full test-chip integrations and tapeout.

Mr. Giterman received the Presidential Scholarship for outstanding doctorate students in 2014 and the Lev-Zion Scholarship for excellent Ph.D. students in 2016.



**Alexander Fish** (M'06) received the B.Sc. degree in electrical engineering from Technion-Israel Institute of Technology, Haifa, Israel, in 1999, and the M.Sc. and Ph.D. (*summa cum laude*) degrees from the Ben-Gurion University of the Negev (BGU), Beersheba, Israel, in 2002 and 2006, respectively.

He was a Post-Doctoral Fellow with the ATIPS Laboratory, University of Calgary, Calgary, AB, Canada, from 2006 to 2008. In 2008, he joined the Electrical and Computer Engineering Department, Ben-Gurion University of the Negev, as a Faculty

Member, where he founded the Low Power Circuits and Systems Laboratory, specializing in low-power circuits and systems. In 2011, he was appointed as the Head of the VLSI Systems Center, BGU. In 2012, he joined the Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel, as an Associate Professor, and the Head of the Nanoelectronics Track. He is currently the Founder and a Leading Member of Emerging Nanoscaled Integrated Circuits and Systems Labs, BGU. He has authored over 100 scientific papers in journals and conferences and two book chapters. His current research interests include the development of secured hardware, ultralow-power embedded memory arrays, CMOS image sensors, and high-speed and energy-efficient design techniques.

Dr. Fish is a member of the Sensory, VLSI Systems and Applications, and Biomedical Systems Technical Committees of the IEEE Circuits and Systems Society. He has co-authored papers that received the Best Paper Finalist Award at the IEEE International Symposium on Circuits and Systems (ISCAS) and the IEEE International Conference on Electronics, Circuits and Systems conferences. He also served as the chair of different tracks of various IEEE conferences. He was a Co-Organizer of many special sessions at the IEEE conferences, including the IEEE ISCAS, the IEEE Sensors, and the IEEE Sensors and Microelectronics (Elsevier), and an Associate Editor of the IEEE ACCESS, *Microelectronics* (Elsevier), and *Integration, the VLSI Journal* (Elsevier).



**Narkis Geuli** received the B.Sc. degree in electrical engineering from Technion-Israel Institute of Technology, Haifa, Israel, in 2007.

From 2006 to 2012, she was an Analog Circuit Engineer with Zoran Microelectronics Corporation, Haifa. Since 2012, she has been with the Digital Backend Team, Mellanox Technologies, Yokne'am Illit, Israel, as a Senior Staff Engineer, where she leads the IP and SRAM integration in Mellanox products. She co-operates with Bar-Ilan University, Ramat Gan, Israel, where she is involved in the design and integration of gain-cell embedded DRAM in Mellanox products.



**Elad Mentovich** received the B.Sc. degree in physics and material engineering from Technion-Israel Institute of Technology, Haifa, Israel, in 2005, and the M.Sc. and Ph.D. degrees in physical chemistry from Tel Aviv University, Tel Aviv, Israel, in 2007 and 2011, respectively.

He is currently with Mellanox Technologies, Yokne'am Illit, Israel, as a Principle Engineer.



**Andreas Burg** (S'97–M'05) was born in Munich, Germany, in 1975. He received the Dipl.Ing. degree from the Swiss Federal Institute of Technology (ETH) Zürich, Zürich, Switzerland, in 2000, and the Dr.Sc.Techn. degree from the Integrated Systems Laboratory, ETH Zürich, in 2006.

In 1998, he was with Siemens Semiconductors, San Jose, CA, USA. During his Ph.D. studies, he was with Bell Labs Wireless Research, Holmdel, NJ, USA, for one year. From 2006 to 2007, he was a Post-Doctoral Researcher with the Integrated Systems Laboratory and the Communication Theory Group, ETH Zürich. In 2007, he co-founded Celestriaus, Zürich, an ETH-spinoff in the field of MIMO wireless communication, where he was responsible for the ASIC development as the Director for VLSI. In 2009, he joined ETH Zürich as a Swiss National Science Foundation (SNF) Assistant Professor and the Head of the Signal Processing Circuits and Systems Group, Integrated Systems Laboratory. Since 2011, he has been a Tenure-Track Assistant Professor with the École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, where he is leading the Telecommunications Circuits Laboratory.

Prof. Burg is also a member of the EURASIP SAT SPCN and the IEEE TC-DISPS. In 2000, he received the Willi Studer Award and the ETH Medal for his Diploma and his Diploma thesis, respectively. He also received the ETH Medal for his Ph.D. dissertation in 2006. In 2008, he received a 4-years grant from the SNF for an SNF Assistant Professorship. With his students, he received the Best Paper Award from the *EURASIP Journal on Image and Video Processing* in 2013 and the Best Demo/Paper Award at the 2013 IEEE International Symposium on Circuits and Systems, the 2013 IEEE International Conference on Electronics, Circuits and Systems, and Fortieth Asilomar Conference on Signals, Systems and Computers 2007. He has served on the technical program committee (TPC) of various conferences on signal processing, communications, and VLSI. He was a TPC Co-Chair for VLSI-SoC 2012, European Solid State Conference on Circuits Conference 2016, and IEEE Workshop on Signal Processing Systems 2017. He served as an Editor of the *IEEE TRANSACTION OF CIRCUITS AND SYSTEMS* in 2013 and on the Editorial Board of the *Microelectronics Journal* (Springer). He is an Editor of the *Journal on Signal Processing Systems* (Springer), and the *Journal on Low Power Electronics and its Applications* (MDPI).



**Adam Teman** received the B.Sc. degree in electrical engineering and the M.Sc. degree from the Ben-Gurion University of the Negev, Beersheba, Israel, in 2006 and 2011, respectively, and the Ph.D. degree as part of the Low Power Circuits and Systems Laboratory from the VLSI Systems Center, Ben-Gurion University of the Negev, in 2014, under the supervision of Prof. A. Fish.

He was a Design Engineer with Marvell Semiconductors, Yokneam, Isreal, from 2006 to 2007, with an emphasis on physical implementation. From 2014 to 2015, he was a Post-Doctoral Researcher with the Telecommunications Circuits Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, under a Swiss Government Excellence Scholarship. In 2015, he joined the Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel, in 2015, as a Tenure-Track Senior Lecturer at the Department of Electrical Engineering and as a Partner at the Emerging Nanoscaled Integrated Circuits and Systems Labs Research Center. He has authored over 50 scientific papers and holds four patent applications. He has participated in over 15 IC tapeouts. He has co-authored the recently published book *Gain-Cell Embedded DRAMs for Low-Power VLSI Systems-on-Chip* (Springer). His current research interests include low-voltage digital design, energy-efficient SRAM, non-volatile memory, and embedded DRAM memory arrays, low-power CMOS image sensors and low-power design techniques for digital and analog VLSI chips, energy-efficient digital system implementation, approximate computing, and significance-driven computing for reliability and power optimization.

Dr. Teman was honored with the Electrical Engineering Department's Teaching Excellence Recognition at the Ben-Gurion University of the Negev from 2010 to 2012. In 2011, he received the BGU's Outstanding Project Award. He received the Yizhak Ben-Ya'akov HaCohen Prize in 2010, the BGU Rector's Prize for Outstanding Academic Achievement in 2012, the Wolf Foundation Scholarship for Excellence in 2012, and the Intel Prize for Ph.D. students in 2013. His doctoral studies were conducted under a Kretzman Foundation Fellowship. He is an Associate Editor of the *Microelectronics Journal* and a member of the technical and review boards of several conferences and journals.