

# Offset-Canceling Current-Sampling Sense Amplifier for Resistive Nonvolatile Memory in 65 nm CMOS

Taehee Na, *Student Member, IEEE*, Byungkyu Song, Jung Pill Kim, Seung H. Kang,  
and Seong-Ook Jung, *Senior Member, IEEE*

**Abstract**—Resistive nonvolatile memory (NVM) is considered to be a leading candidate for next-generation memory. However, maintaining a target sensing margin is a challenge with technology scaling because of the increased process variation and decreased read cell current. This paper proposes an offset-canceling current-sampling sense amplifier (OCCS-SA) that is intended for use in deep submicrometer resistive NVM. The proposed OCCS-SA has the three major advantages of: 1) offset voltage cancellation; 2) double sensing margin structure; and 3) strong positive feedback. The measurement results from a 65 nm test chip show that the proposed OCCS-SA achieves 2.4 times faster sensing time ( $t_{SEN}$ ) at a nominal supply voltage ( $V_{DD}$ ) of 1.0 V and a greater than 20% reduction in  $V_{DD}$  at the same  $t_{SEN}$ , compared to the state-of-the-art current-sampling-based SA, which features offset voltage cancellation and weak positive feedback.

**Index Terms**—Double sensing margin, nonvolatile memory (NVM), offset voltage cancellation, resistive random access memory (RAM), sensing margin, spin transfer torque RAM.

## I. INTRODUCTION

ALTHOUGH resistive nonvolatile memories (NVMs) such as spin-transfer-torque random access memory (RAM) and resistive RAM promise higher densities and lower power consumption than conventional memories such as static RAM (SRAM), dynamic RAM (DRAM), and flash memory [1]–[3], resistive NVMs suffer from a degraded sensing margin with technology scaling because of the increased process variation, reduced supply voltage ( $V_{DD}$ ), and decreased read cell current ( $I_{CELL}$ ) [4]–[6].

In general, a resistive NVM's array consists of a large number of data bit cells, a small number of reference bit cells, and a sense amplifier (SA), as shown in Fig. 1(a) [4]. Because the sensing margin is predominantly determined by the SA offset voltage in deep submicrometer technology nodes [6], [7], the SA offset voltage cancellation has now become an essential circuit design technique for not only resistive NVM [5]–[10], but also other memories such as SRAM [11]–[15], DRAM [16], [17], and flash memory [18], [19]. Because the gain requirement in the SA in the memory devices is relatively

Manuscript received June 8, 2016; revised August 4, 2016; accepted September 16, 2016. Date of publication October 25, 2016; date of current version January 30, 2017. This paper was approved by Associate Editor Vivek De.

T. Na, B. Song, and S.-O. Jung are with the School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, South Korea (e-mail: sjung@yonsei.ac.kr).

J. P. Kim and S. H. Kang are with Advanced Technology, Qualcomm Inc., San Diego, CA 92121 USA.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2016.2612235

small compared to that of an operational amplifier used in analog circuits, the SA offset voltage cancellation is usually achieved by an open loop with two phase sensing operations, as shown in Fig. 1(b): 1) the offset voltage is sampled in phase 1 (P1) and 2) the SA input voltage is amplified without the offset voltage in phase 2 (P2).

In addition to the offset voltage cancellation, a sensing circuitry with a double sensing margin structure has been actively studied to overcome the sensing margin issue [9], [20]–[22]. For a given SA input, the double sensing margin structure doubles the sensing margin by generating a variable reference signal that is dependent on the data bit-cell state. Fig. 1(c) shows the concept of the double sensing margin structure including data-dependent reference generator (DDRG): when the bit-cell state is 0 (1),  $I_{CELL0}$  ( $I_{CELL1}$ ) flowing through bit cell and the DDRG make the effective reference current ( $I_{REF\_EFF}$ ) value to be  $I_{CELL1}$  ( $I_{CELL0}$ ), and the sensing margin is thus doubled.

Recently, Chang *et al.* [6] proposed a current-sampling-based SA (CSB-SA) that is capable of mitigating the offset in the SA. However, the CSB-SA still suffers from a low sensing margin and long sensing time ( $t_{SEN}$ ) because of its limited offset tolerance and weak positive feedback.

This paper proposes an offset-canceling current-sampling SA (OCCS-SA) to improve the offset tolerance and reduce  $t_{SEN}$  by utilizing: 1) offset voltage cancellation; 2) double sensing margin structure; and 3) strong positive feedback. To verify these features, 32 × 32 SA array containing 1024 CSB-SAs and 1024 OCCS-SAs was implemented with identical mismatches in a 65 nm CMOS technology. The experimental results show that the OCCS-SA achieves a 2.4 times faster  $t_{SEN}$  at a nominal  $V_{DD}$  of 1.0 V, and a more than 20% lower  $V_{DD}$  at the same  $t_{SEN}$ , compared to the CSB-SA.

The remainder of this paper is organized as follows. Section II describes the proposed OCCS-SA. Section III presents the test-chip structure and experimental results. Section IV presents the conclusions in our study.

## II. PROPOSED OCCS-SA

In this section, the operation and high read stability characteristics of the proposed OCCS-SA are described and compared with the state-of-the-art CSB-SA.

### A. Circuit Diagram and Operation of the OCCS-SA

Fig. 2 shows the conceptual circuit diagrams and simulated waveforms of the state-of-the-art CSB-SA and the proposed OCCS-SA. The only difference between the OCCS-SA and



Fig. 1. (a) Resistive NVM array. (b) Concept of offset voltage cancellation. (c) Concept of double sensing margin structure including DDRG.



Fig. 2. Conceptual circuit diagrams and simulated waveforms of (a) state-of-the-art CSB-SA and (b) proposed OCCS-SA. Bit-cell state 1 ( $I_{CELL} < I_{REF}$ ) is assumed in the  $V_{SA1}$  and  $V_{SA2}$  waveforms.

the CSB-SA is the inclusion of two double sensing margin switches DS1 and DS2 in the OCCS-SA, but this difference has a substantial influence on the sensing margin and performance, as the DS1 and DS2 not only serve as the DDRG but also generate the strong positive feedback, which will be discussed in detail later. Fig. 3 shows the operation of the OCCS-SA. In P1, the operations of the CSB-SA and

OCCS-SA are exactly the same. Four switches (S1-S4) are turned on at the beginning of P1, and two diode-connected transistors (M1 and M2) provide precharge currents ( $I_{PRE1}$  and  $I_{PRE2}$ ) to the bit-line (BL) and dummy BL together with  $I_{CELL}$  and  $I_{REF}$ , respectively. Current  $I_{M1}$  ( $I_{M2}$ ) flowing through M1 (M2) becomes  $I_{CELL}$  ( $I_{REF}$ ) after a sufficient precharge time, regardless of the process variation in M1 (M2),



Fig. 3. Operation of the OCCS-SA. (a) Phase 1. (b) Phase 2.



Fig. 4. High read stability characteristic of the OCCS-SA in bit-cell state 1 ( $I_{CELL} < I_{REF}$ ), where  $I_{M1}$  should be read as smaller than  $I_{M2}$  in P1. (a) P1 without a mismatch between M1 and M2. (b) P1 with a mismatch between M1 and M2. (c) P2 in the case that incorrect current sampling occurs in P1.

as long as  $I_{CELL}$  and  $I_{REF}$  are constant in P1. At the end of P1, the voltages at nodes SA1 and SA2 ( $V_{SA1}$  and  $V_{SA2}$ ) are stored in the left plate of  $C_1$  and the right plate of  $C_2$ , respectively.

In P2, the operations of the CSB-SA and OCCS-SA are different. In both the SAs, the four switches (S1–S4) are turned off. However, in the OCCS-SA, the two double sensing margin switches (DS1 and DS2) are turned on, whereas in the CSB-SA, DS1 and DS2 remain off. In the CSB-SA, without any current path to GND, the SA1 and SA2 nodes are charged by M1 and M2 with the sampled current  $I_{M1}$  and  $I_{M2}$ , respectively. Thus, both  $V_{SA1}$  and  $V_{SA2}$  in P2 monotonically increase and ultimately reach  $V_{DD}$  at the steady state. At the beginning of P2, the total current difference ( $\Delta I_{SA}$ ) between the SA1 and SA2 nodes becomes  $\Delta I_{CELL}$ , where  $\Delta I_{CELL} = I_{CELL} - I_{REF}$ , and according to the sign of  $\Delta I_{SA}$ , the direction of the positive feedback caused by the ac-coupling behavior of  $C_1$  and  $C_2$  is determined. When  $\Delta I_{SA}$  is positive (negative), it means  $I_{CELL} > I_{REF}$  ( $I_{CELL} < I_{REF}$ ),  $V_{SA1}$  ( $V_{SA2}$ ) increases faster than  $V_{SA2}$  ( $V_{SA1}$ ). Please refer to the waveform in Fig. 2(a), which shows the case of  $I_{CELL} < I_{REF}$ . It should

be noted that the direction of the positive feedback in the CSB-SA is only determined by the current sampling in P1. Thus, incorrect current sampling caused by the process variation leads directly to read failures, which will be explained in detail in Section II-B. Moreover, the positive feedback without current path to GND is relatively weak, resulting in a longer  $t_{SEN}$ . In the OCCS-SA, the M1 (M2) charges the SA1 (SA2) node with the sampled current  $I_{M1}$  ( $I_{M2}$ ), while the DS1 (DS2) discharges the SA1 (SA2) node with the current  $I_{REF}$  ( $I_{CELL}$ ). Thus, the total current difference between the SA1 and SA2 nodes in the OCCS-SA ( $\Delta I_{SA} = 2\Delta I_{CELL}$ ) is twice that of the CSB-SA ( $\Delta I_{SA} = \Delta I_{CELL}$ ). As described before, the DS1 and DS2 in the OCCS-SA serve as the DDRG to double the  $\Delta I_{SA}$ . When the  $\Delta I_{SA}$  is positive (negative),  $V_{SA1}$  increases (decreases) and  $V_{SA2}$  decreases (increases), because of the difference between the charging and discharging currents. Unlike the weak positive feedback of the CSB-SA, the OCCS-SA has a current path to GND through DS1 and DS2, resulting in strong positive feedback.



Fig. 5. Simulated transient responses (bit-cell state 1). (a) CSB-SA without mismatch between M1 and M2. (b) CSB-SA with 200 mV  $V_{TH}$  mismatch between M1 and M2. (c) OCCS-SA with 200 mV  $V_{TH}$  mismatch between M1 and M2 and  $t_{mismatch} = 0\text{ ns}$ . (d) OCCS-SA with 200 mV  $V_{TH}$  mismatch between M1 and M2 and  $t_{mismatch} = 5\text{ ns}$ .

### B. High Read Stability Characteristic of the OCCS-SA

Fig. 4 shows the high read stability characteristic of the OCCS-SA with a mismatch between M1 and M2. Assuming a bit-cell state 1 ( $I_{CELL} < I_{REF}$ ),  $I_{M1}$  should be smaller than  $I_{M2}$  in P1, as shown in Fig. 4(a), leading to  $V_{SA1} > V_{SA2}$  in P1 and  $V_{SA1} < V_{SA2}$  in P2 (as indicated by the waveforms in Fig. 2). However, in deep submicrometer resistive NVM,  $I_{CELL}$  and  $I_{REF}$  are not ideal current sources because they are generated by clamp nMOS, which is highly sensitive to channel length modulation. Therefore,  $I_{M1}$  could be larger when there is a significant transistor mismatch between M1 and M2, as shown in Fig. 4(b). This results in incorrect current sampling in P1. In the CSB-SA, incorrect current sampling leads directly to read failures. In contrast, the OCCS-SA senses correctly even in the case of  $I_{M1} > I_{M2}$  in P1, using the double sensing margin structure and strong positive feedback as shown in Fig. 4(c). When  $I_{M1} > I_{M2}$  in P1 because of incorrect current sampling in the OCCS-SA,  $V_{SA1}$  ( $V_{SA2}$ ) starts to decrease (increase) at the beginning of P2 because of the  $I_{REF}$  ( $I_{CELL}$ ) flowing through DS1 (DS2).  $V_{SA1}$  ( $V_{SA2}$ )

continues to decrease (increase) to almost GND ( $V_{DD}$ ) because of the strong positive feedback. Thus, the currents sampled incorrectly because of the M1 and M2 mismatch can be corrected by the OCCS-SA as long as  $I_{CELL}$  and  $I_{REF}$  are not reversed. Therefore, the OCCS-SA offers high read stability, resulting in a much better offset tolerance than the CSB-SA.

Fig. 5(a) shows the simulated transient response of the CSB-SA in bit-cell state 1 without mismatch between M1 and M2. Fig. 5(b) shows the simulated transient response of the CSB-SA with a 200 mV  $V_{TH}$  mismatch between M1 and M2. It shows that the incorrect current sampling in P1 results in read failures. On the other hand, Fig. 5(c), which is the simulated transient response of the OCCS-SA with the same condition as in Fig. 5(b), clearly shows that the OCCS-SA can sense correctly even in the case of incorrect current sampling in P1 by virtue of the double sensing margin structure and strong positive feedback. Furthermore, Fig. 5(d) shows that the high read stability characteristic of the OCCS-SA is maintained even if there is a timing mismatch between P1 and P2 ( $t_{mismatch} = \text{AMP rising edge} - \text{PRE falling edge}$ ) by virtue of the strong positive feedback.

Fig. 6 shows  $\Delta V$  [=  $\min(V_{SA1} - V_{SA2}$  at bit-cell state 0,  $V_{SA2} - V_{SA1}$  at bit-cell state 1)] versus  $V_{TH}$  mismatch between M1 and M2 when  $V_{DD} = 1.0$  V and  $R_{CELL}/R_{REF} = 4.5/6.0$  k $\Omega$ .  $\Delta V$  of the OCCS-SA keeps almost constant regardless of the  $V_{TH}$  mismatch even with  $V_{TH}$  mismatch of a few hundreds of millivolts, whereas  $\Delta V$  of the CSB-SA drastically decreases with  $V_{TH}$  mismatch. It clearly shows that the OCCS-SA has a much better offset tolerance than the CSB-SA.

### III. TEST CHIP STRUCTURE AND EXPERIMENTAL RESULTS

In this section, we detail the test chip structure and describe the experimental results.

#### A. Test Chip Structure

Fig. 7 shows the test chip structure implemented in the 65 nm CMOS technology. The structure used  $32 \times 32$  SA array containing 1024 CSB-SAs and 1024 OCCS-SAs with various  $R_{CELL}$  and  $R_{REF}$  combinations. The structure associated with the array includes signal generators that generate the following signals: PRE for S1–S4 control, AMP for DS1 and DS2 control, LAT for the latch enable, and LAT\_d, which is a delayed LAT signal used in the associated D flip-flop (D-F/F) signals. The structure also includes a binary to thermometer code converter (BTC) to control  $t_{SEN}$ , decoders, selectors, and multiplexers (Muxs). For the purpose of testing the SA yield,  $R_{CELL}$  and  $R_{REF}$  are implemented by diffusion resistors. A 3 k $\Omega$  diffusion resistor is implemented with a width of 0.4  $\mu\text{m}$  and length of 11.72  $\mu\text{m}$ , and the sizes of the other resistors are determined in proportion to their different resistances. When an access transistor for a bit cell is assumed to be sized to the minimum transistor size for two contacts (width = 0.39  $\mu\text{m}$  and length = 0.06  $\mu\text{m}$ ), the effective cell per BL (CpBL) for an SA is calculated to be 1650. The CSB-SA and OCCS-SA were implemented on the test structure together to be evaluated with identical mismatches; each SA is selected by the AMP signal in P2 (AMP = 0 for CSB-SA and AMP = 1 for OCCS-SA). Because a latch circuit generates a digital signal (0 or 1) from the outputs ( $V_{SA1}$ ,  $V_{SA2}$ ) of the OCCS-SA as shown in Fig. 2, a voltage-latched SA with double switches and transmission gate access transistors (DSTA-VLSA) can be enabled by the LAT signal and is employed to eliminate the sensing dead zone [23]. All the SAs share the same  $R_{CELL}$  and  $R_{REF}$ , and each SA is sequentially selected to find its yield. All the SAs are precharged to GND before operation to find the true yield caused by only the transistor mismatch. Fig. 8 shows the transistor-level schematic and transistor sizes of the OCCS-SA. To implement the two capacitors ( $C_1$  and  $C_2$ ) shown in Fig. 2, pMOSCAPs (M5, M6) are used with twice the transistor size compared with the load pMOSs (M1, M2) and clamp nMOSs (M3, M4). Degeneration pMOSs (M7, M8) are employed to further reduce the offset caused by the load pMOSs [24].

#### B. Experimental Results

Fig. 9 shows the proposed core (OCCS-SA + DSTA-VLSA) layout. The core is designed to have an area of



Fig. 6. Simulated  $\Delta V$  [=  $\min(V_{SA1} - V_{SA2}$  at bit-cell state 0,  $V_{SA2} - V_{SA1}$  at bit-cell state 1)] versus  $V_{TH}$  mismatch between M1 and M2 transistors when  $V_{DD} = 1.0$  V and  $R_{CELL}/R_{REF} = 4.5/6.0$  k $\Omega$ .

78.3  $\mu\text{m}^2$ . The figure shows that the area overhead caused by the two pMOSCAPs is less than 10%. Fig. 10 shows a die photo of the test chip in a 65 nm CMOS technology occupying 0.3 mm $^2$ . Fig. 11 shows the measured yield relative to the  $R_{CELL}/R_{REF}$  condition when  $V_{DD} = 1.0$  V, the gate voltage of the clamp nMOS ( $V_{G\_clamp}$ ) = 0.8 V, and  $t_{SEN} = 10$  ns. In this paper, the yield refers to the average yield between bit-cell state 0 (e.g.,  $R_{CELL} = 4.5$  k $\Omega$  and  $R_{REF} = 6.0$  k $\Omega$  in a 4.5/6.0 k $\Omega$  condition) and bit-cell state 1 (e.g.,  $R_{CELL} = 6.0$  k $\Omega$  and  $R_{REF} = 4.5$  k $\Omega$  in a 4.5/6.0 k $\Omega$  condition). The OCCS-SA achieves a better yield than the CSB-SA, and the yield difference increases as the resistance difference ( $\Delta R = |R_{CELL} - R_{REF}|$ ) decreases. These effects are attributed to the double sensing margin structure and the strong positive feedback of the OCCS-SA. The OCCS-SA achieves a 100% yield even in the  $R_{CELL}/R_{REF} = 5.0/5.5$  k $\Omega$  condition, whereas the CSB-SA achieves only 57.8%. These results indicate that the OCCS-SA has much better offset tolerance than the CSB-SA. Fig. 12 shows the measured yield relative to  $V_{G\_clamp}$  and  $I_{CELL}$  relative to  $V_{G\_clamp}$  when  $V_{DD} = 1.0$  V,  $R_{CELL}/R_{REF} = 4.5/6.0$  k $\Omega$ , and  $t_{SEN} = 10$  ns. The CSB-SA cannot achieve a 100% yield even if  $I_{CELL}$  is sufficiently large ( $> 26$   $\mu\text{A}$ ), because of the limited  $t_{SEN}$  of 10 ns or  $\Delta R$  of 1.5 k $\Omega$ . In contrast, the OCCS-SA achieves a 100% yield even in the case of  $V_{G\_clamp} = 0.4$  V, which corresponds to  $I_{CELL} = 9.6$   $\mu\text{A}$  (8.5  $\mu\text{A}$ ) when  $R_{CELL} = 4.5$  k $\Omega$  (6.0 k $\Omega$ ). Therefore, the OCCS-SA is much more efficient in terms of power than the CSB-SA. Fig. 13 shows the measured yield relative to  $t_{SEN}$  when  $V_{DD} = 1.0$  V,  $V_{G\_clamp} = 0.8$  V, and  $R_{CELL}/R_{REF} = 4.5/6.0$  k $\Omega$ . The CSB-SA cannot achieve a 100% yield even in the case of  $t_{SEN} = 10$  ns. In contrast, the OCCS-SA achieves a 100% yield until  $t_{SEN}$  is decreased to 5 ns because of its double sensing margin structure and strong positive feedback. Therefore, the OCCS-SA offers a significantly improved performance compared with the CSB-SA. Fig. 14 shows a measured waveform of the OCCS-SA in the condition of  $V_{DD} = 1.0$  V,  $V_{G\_clamp} = 0.8$  V, and  $R_{CELL}/R_{REF} = 4.5/6.0$  k $\Omega$ . Although Fig. 14 shows only the steady-state values of  $V_{SA1}$  and  $V_{SA2}$  after P2 because of the limited bandwidth of the unity-gain buffer used in the test chip, it is nonetheless clear that the OCCS-SA's  $V_{SA1}$  and  $V_{SA2}$  become almost rail-to-rail voltages, as shown



Fig. 7. Test chip structure with the  $32 \times 32$  SA array containing 1024 CSB-SAs and 1024 OCCS-SAs.



Fig. 8. Transistor-level schematic and transistor sizes of the OCCS-SA.

in the simulated waveforms in Figs. 2 and 5.  $V_{SA1}$  tends to increase and  $V_{SA2}$  tends to decrease as time elapses in the operating timeframe of  $250 \mu\text{s}$ . This is because of the leakage current from and to the floated gate nodes of M1 and M2. Fig. 15 shows a shmoof plot of  $V_{DD}$  relative to  $t_{SEN}$  when  $R_{CELL}/R_{REF} = 4.5/6.0 \text{ k}\Omega$  and  $V_{G\_clamp} = 0.8 \text{ V}$ . The  $t_{SEN}$  of the OCCS-SA (5 ns) is 2.4 times faster than that of the CSB-SA (12 ns) at the nominal  $V_{DD}$  of 1.0 V. For the same  $t_{SEN}$  from 6 to 11 ns, the OCCS-SA can also achieve a more than 20% lower  $V_{DD}$  compared with that of the CSB-SA. Table I summarizes the 65 nm test chip results of the proposed OCCS-SA features: 1) offset voltage cancellation; 2) double sensing margin



Fig. 9. Core (OCCS-SA + DSTA-VLSA) layout.

structure; and 3) strong positive feedback, whereas the state-of-the-art CSB-SA features: 1) offset voltage cancellation and 2) weak positive feedback. The minimum  $\Delta R$  that the OCCS-SA can sense correctly when  $V_{DD} = 1.0 \text{ V}$ ,  $V_{G\_clamp} = 0.8 \text{ V}$ , and  $t_{SEN} = 10 \text{ ns}$  is smaller than  $0.5 \text{ k}\Omega$ , whereas the minimum  $\Delta R$  is  $2.5 \text{ k}\Omega$  for the CSB-SA. The minimum  $t_{SEN}$  of the OCCS-SA is 5 ns, whereas that of the CSB-SA is 12 ns, in the condition of  $V_{DD} = 1.0 \text{ V}$ ,  $V_{G\_clamp} = 0.8 \text{ V}$ , and  $\Delta R = 1.5 \text{ k}\Omega$ . The minimum  $V_{DD}$  of the OCCS-SA is 0.85 V, whereas that of the CSB-SA is 1.05 V, in the condition of  $t_{SEN} = 10 \text{ ns}$ ,  $V_{G\_clamp} = 0.8 \text{ V}$ , and  $\Delta R = 1.5 \text{ k}\Omega$ .

Although both the CSB-SA and the OCCS-SA implement the two-phase sensing operation, the CSB-SA has no dc current path to GND in P2, whereas the OCCS-SA has



Fig. 10. Die photo of the test chip implemented in a 65 nm CMOS technology.



Fig. 11. Measured yield versus  $R_{\text{CELL}}/R_{\text{REF}}$  condition when  $V_{\text{DD}} = 1.0$  V,  $V_{\text{G\_clamp}} = 0.8$  V, and  $t_{\text{SEN}} = 10$  ns.



Fig. 12. Measured yield versus  $V_{\text{G\_clamp}}$  and  $I_{\text{CELL}}$  versus  $V_{\text{G\_clamp}}$  when  $V_{\text{DD}} = 1.0$  V,  $R_{\text{CELL}}/R_{\text{REF}} = 4.5/6.0$  kΩ, and  $t_{\text{SEN}} = 10$  ns.

the dc current paths through DS1 and DS2. For this reason, if the two SAs operate in the same conditions with the same phase sensing time ( $t_{\text{SEN\_P1}}$  for P1,  $t_{\text{SEN\_P2}}$  for P2), the read energy per bit of the OCCS-SA becomes higher than that of the CSB-SA. However, not only is the minimum  $t_{\text{SEN}}$  of the OCCS-SA 2.4 times shorter than that of the CSB-SA, as shown in Table I, but also the portion of the OCCS-SA's



Fig. 13. Measured yield versus  $t_{\text{SEN}}$  when  $V_{\text{DD}} = 1.0$  V,  $V_{\text{G\_clamp}} = 0.8$  V, and  $R_{\text{CELL}}/R_{\text{REF}} = 4.5/6.0$  kΩ.



Fig. 14. Measured waveform of the OCCS-SA when  $V_{\text{DD}} = 1.0$  V,  $V_{\text{G\_clamp}} = 0.8$  V, and  $R_{\text{CELL}}/R_{\text{REF}} = 4.5/6.0$  kΩ (bit-cell state = 1).



Fig. 15. Shmoo plot of  $V_{\text{DD}}$  versus  $t_{\text{SEN}}$  when  $R_{\text{CELL}}/R_{\text{REF}} = 4.5/6.0$  kΩ, and  $V_{\text{G\_clamp}} = 0.8$  V.

$t_{\text{SEN}}$  occupied by  $t_{\text{SEN\_P2}}$  (approximately 15%) is smaller than that occupied by  $t_{\text{SEN\_P1}}$  (approximately 85%) because of the OCCS-SA's relatively large effective CpBL of 1650. Therefore, the read energy per bit of the OCCS-SA (approximately 244 fJ) is less than half that of the CSB-SA (approximately 498 fJ).

In memory design, especially in storage and main memory designs, increasing the array efficiency (or the total number of dies in a wafer) is of prime importance because it is directly linked with price competitiveness. For this reason, most memory designers are reluctant to employ an offset-canceling circuitry in their memory designs because they require capacitors, as illustrated in Fig. 1(b), which can greatly degrade the array efficiency. As the technology node scales

TABLE I  
SUMMARY OF THE TEST CHIP RESULTS

|                                                                                                                                                            | CSB-SA [6]                                                                                                | OCCS-SA                                                                                                                                      |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| Process                                                                                                                                                    | 65 nm                                                                                                     |                                                                                                                                              |
| Features                                                                                                                                                   | <ul style="list-style-type: none"> <li>✓ Offset cancellation</li> <li>✓ Weak positive feedback</li> </ul> | <ul style="list-style-type: none"> <li>✓ Offset cancellation</li> <li>✓ Double sensing margin</li> <li>✓ Strong positive feedback</li> </ul> |
| Effective CpBL                                                                                                                                             | 1650                                                                                                      |                                                                                                                                              |
| $I_{\text{CELL}} @ R_{\text{CELL}} = 4.5\text{k}\Omega$<br>( $6.0\text{k}\Omega$ ), $V_{\text{DD}} = 1.0\text{V}$ ,<br>$V_{\text{G\_clamp}} = 0.8\text{V}$ | 26.0 $\mu\text{A}$ (22.8 $\mu\text{A}$ )                                                                  |                                                                                                                                              |
| Min. $\Delta R @ V_{\text{DD}} = 1.0\text{V}$ ,<br>$t_{\text{SEN}} = 10\text{ns}$                                                                          | 2.5 $\text{k}\Omega$                                                                                      | < 0.5 $\text{k}\Omega$                                                                                                                       |
| Min. $t_{\text{SEN}} @ V_{\text{DD}} = 1.0\text{V}$ ,<br>$\Delta R = 1.5\text{k}\Omega$                                                                    | 12 ns                                                                                                     | 5 ns                                                                                                                                         |
| Min. $V_{\text{DD}} @ t_{\text{SEN}} = 10\text{ns}$ ,<br>$\Delta R = 1.5\text{k}\Omega$                                                                    | 1.05 V                                                                                                    | 0.85 V                                                                                                                                       |

down, however, the offset-canceling circuitry becomes essential because the sensing margin is predominantly determined by the SA offset voltage [6], [7], and the offset cancellation technique is much more efficient for reducing the offset voltage compared with simply enlarging the transistor [11]. Furthermore, the area overhead caused by the capacitors can be rendered insignificant with an optimum offset-canceling circuitry that uses pMOSCAPs comparable in size with other transistors, as proven by our test chip.

#### IV. CONCLUSION

In this paper, we proposed a novel offset-tolerant SA for resistive NVM, called an OCCS-SA, to improve the offset tolerance and reduce  $t_{\text{SEN}}$ . Compared with the state-of-the-art CSB-SA, which features offset voltage cancellation and weak positive feedback, the OCCS-SA achieves a higher read stability by synergizing the two additional features of double sensing margin structure and strong positive feedback. The experimental results from the fabricated 65 nm test chip proved the effectiveness of the OCCS-SA, achieving a 2.4 times faster  $t_{\text{SEN}}$  and greater than 20% reduction in  $V_{\text{DD}}$  compared with the CSB-SA.

#### ACKNOWLEDGMENT

The authors would like to thank IDEC for providing the MPW and CAD tools.

#### REFERENCES

- [1] C. J. Lin *et al.*, “45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell,” in *IEDM Tech. Dig.*, Dec. 2009, pp. 1–4.
- [2] N. D. Rizzo *et al.*, “A fully functional 64 Mb DDR3 ST-MRAM built on 90 nm CMOS technology,” *IEEE Trans. Magn.*, vol. 49, no. 7, pp. 4441–4446, Jul. 2013.
- [3] A. Kawahara *et al.*, “An 8 Mb multi-layered cross-point ReRAM macro with 443 MB/s write throughput,” *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 178–185, Jan. 2013.
- [4] T. Na, J. Kim, J. P. Kim, S. H. Kang, and S.-O. Jung, “Reference-scheme study and novel reference scheme for deep submicrometer STT-RAM,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 12, pp. 3376–3385, Dec. 2014.
- [5] M. Jefremow *et al.*, “Time-differential sense amplifier for sub-80mV bitline voltage embedded STT-MRAM in 40nm CMOS,” in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2013, pp. 216–217.
- [6] M.-F. Chang *et al.*, “An offset-tolerant fast-random-read current-sampling-based sense amplifier for small-cell-current nonvolatile memory,” *IEEE J. Solid-State Circuits*, vol. 48, no. 3, pp. 864–877, Mar. 2013.
- [7] T. Na, J. Kim, J. P. Kim, S. H. Kang, and S.-O. Jung, “An offset-canceling triple-stage sensing circuit for deep submicrometer STT-RAM,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 7, pp. 1620–1624, Jul. 2014.
- [8] B. Song, T. Na, J. Kim, J. P. Kim, S. H. Kang, and S.-O. Jung, “Latch offset cancellation sense amplifier for deep submicrometer STT-RAM,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 7, pp. 1776–1784, Jul. 2015.
- [9] T. Na, J. Kim, J. P. Kim, S. H. Kang, and S.-O. Jung, “A double-sensing-margin offset-canceling dual-stage sensing circuit for resistive nonvolatile memory,” *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. 62, no. 12, pp. 1109–1113, Dec. 2015.
- [10] T. Na, J. Kim, B. Song, J. P. Kim, S. H. Kang, and S.-O. Jung, “An offset-tolerant dual-reference-voltage sensing scheme for deep submicrometer STT-RAM,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 4, pp. 1361–1370, Apr. 2016.
- [11] N. Verma and A. P. Chandrakasan, “A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy,” *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [12] N. Verma and A. P. Chandrakasan, “A high-density 45 nm SRAM using small-signal non-strobed regenerative sensing,” *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 163–173, Jan. 2009.
- [13] M. Qazi, K. Stawiasz, L. Chang, and A. P. Chandrakasan, “A 512kb 8T SRAM macro operating down to 0.57 V with an AC-coupled sense amplifier and embedded data-retention-voltage sensor in 45 nm SOI CMOS,” *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 85–96, Jan. 2011.
- [14] H. Jeong *et al.*, “Switching pMOS sense amplifier for high-density low-voltage single-ended sRAM,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 6, pp. 1555–1563, Jun. 2015.
- [15] M. Khayatzadeh, F. Frustaci, D. Blaauw, D. Sylvester, and M. Alioto, “A reconfigurable sense amplifier with 3X offset reduction in 28nm FDSOI CMOS,” in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2015, pp. C270–C271.
- [16] S. Hong, S. Kim, J.-K. Wee, and S. Lee, “Low-voltage DRAM sensing scheme with offset-cancellation sense amplifier,” *IEEE J. Solid-State Circuits*, vol. 37, no. 10, pp. 1356–1360, Oct. 2002.
- [17] S. Akiyama, T. Sekiguchi, R. Takemura, A. Kotabe, and K. Itoh, “Low-Vt small-offset gated preamplifier for sub-1V gigabit DRAM arrays,” in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2009, pp. 142–143.
- [18] J. Javanifard *et al.*, “A 45nm self-aligned-contact process 1Gb NOR flash with 5MB/s program speed,” in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2008, pp. 424–624.
- [19] T. Kono *et al.*, “40-nm embedded split-gate MONOS (SG-MONOS) flash macros for automotive with 160-MHz random access for code and endurance over 10 M cycles for data at the junction temperature of 170 °C,” *IEEE J. Solid-State Circuits*, vol. 49, no. 1, pp. 154–166, Jan. 2014.
- [20] J. Kim, T. Na, J. P. Kim, S. H. Kang, and S.-O. Jung, “A split-path sensing circuit for spin torque transfer MRAM (STT-MRAM),” *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. 61, no. 3, pp. 193–197, Mar. 2014.
- [21] M.-F. Chang *et al.*, “Embedded 1Mb ReRAM in 28nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme,” in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2014, pp. 332–333.
- [22] C. Kim, K. Kwon, C. Park, S. Jang, and J. Choi, “A covalent-bonded cross-coupled current-mode sense amplifier for STT-MRAM with 1T1MTJ common source-line structure array,” in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2015, pp. 1–3.
- [23] T. Na, S.-H. Woo, J. Kim, H. Jeong, and S.-O. Jung, “Comparative study of various latch-type sense amplifiers,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 2, pp. 425–429, Feb. 2014.
- [24] J. Kim *et al.*, “A novel sensing circuit for deep submicron spin transfer torque MRAM (STT-MRAM),” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 1, pp. 181–186, Jan. 2012.



**Taehui Na** (S'13) was born in Yesan, South Korea, in 1986. He received the B.S. degree in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2012, where he is currently pursuing the Ph.D. degree.

His current research interests include PVT variation-tolerant circuit and architecture designs for resistive NVM, and low-noise low-power biomedical readout circuit designs.



**Byungkyu Song** was born in Seoul, South Korea, in 1988. He received the B.S. degree in electrical and electronic engineering from Dankook University, Yong-in, South Korea, in 2013. He is currently pursuing the Ph.D. degree at Yonsei University, Seoul.

His current research interests include bit-cell structure and peripheral circuit design for STT-RAM.



**Jung Pill Kim** received the B.S. degree in electronic engineering from Hanyang University, Seoul, South Korea, in 1988, and the M.S. degree in computer science and the Ph.D. degree in electrical engineering from Harvard University, Cambridge, MA, USA, in 2000 and 2003, respectively.

From 1988 to 1998, he was with Hynix Semiconductor, Icheon, South Korea, where he was involved in the development and research of many DRAM products from 1Mb DRAM to 64Mb EDO and SDR DRAM. From 2001 to 2008, he was with Qimonda, Research Triangle Park, NC, USA, where he was involved in high-density 1Gb SDR DRAM products, low-power mobile DRAM and LPDDR1 products, and high-performance 1Gb GDDR3 products as a Design Team Leader and a Principle Design Engineer. Since 2008, he has been with Qualcomm Inc., San Diego, CA, USA, where he is involved in the development of STT-MRAM related IPs and macros. He has authored or co-authored several technical papers and holds more than 20 U.S. patents. His current research interests include low-power circuit, high speed memory, and advanced future memory design and technologies.



**Seung H. Kang** received the B.S. and M.S. degrees from Seoul National University, Seoul, South Korea, and the Ph.D. degree in materials science and engineering from the University of California at Berkeley, Berkeley, CA, USA.

He was with the Lawrence Berkeley National Laboratory, Berkeley, where he was involved in SQUID sensors and very large scale integration interconnects. From 1998 to 2005, he was a Distinguished Member of Technical Staff with Lucent Technologies Bell Laboratories, Murray Hill, NJ, USA, where he led advanced device reliability projects. In 2006, he joined Qualcomm Inc., where he has pioneered embedded STT-MRAM and spintronic devices for mobile systems. He is currently the Director of Engineering, Corporate Research and Development with Qualcomm Technologies, Inc., San Diego, CA, USA, where he leads an Emerging Memory Technology Group for mobile systems, Internet-of-Things, and wearables. He has authored over 70 papers and delivered more than 35 keynote and invited speeches at international conferences. He holds over 350 patents granted globally.

Dr. Kang has served on numerous technical committees and is an IEEE Electron Device Society Distinguished Lecturer.



**Seong-Ook Jung** (M'00–SM'03) received the B.S. and M.S. degrees in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 1987 and 1989, respectively, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana–Champaign, Urbana, IL, USA, in 2002.

From 1989 to 1998, he was with Samsung Electronics Company, Ltd., Hwasung, South Korea, where he was involved in specialty memories, such as video, graphic, and window RAM, and merged memory logic. From 2001 to 2003, he was with T-RAM Inc., Mountain View, CA, USA, where he was the Leader of the Thyristor Based Memory Circuit Design Team. From 2003 to 2006, he was with Qualcomm Inc., San Diego, CA, USA, where he was involved in high-performance low-power embedded memories, process variation tolerant circuit design, and low power circuit techniques. Since 2006, he has been a Professor with Yonsei University. His current research interests include process variation tolerant, low power, and mixed-mode circuit design, and future generation memory and technology.

Dr. Jung is currently a Board Member of the IEEE SSCS Seoul Chapter.