

# A Compact-Area Low-VDDmin 6T SRAM With Improvement in Cell Stability, Read Speed, and Write Margin Using a Dual-Split-Control-Assist Scheme

Meng-Fan Chang, *Senior Member, IEEE*, Chien-Fu Chen, Ting-Hao Chang, Chi-Chang Shuai, Yen-Yao Wang, Yi-Ju Chen, and Hiroyuki Yamauchi

**Abstract**—Previous 6T SRAMs commonly employ a wordline voltage underdrive (WLUD) scheme to suppress half-select (HS) disturbs in read and write cycles, at the expense of reduced cell read current ( $I_{CELL}$ ) and degraded write margin (WM). This paper proposes the dual-split-control (DSC) scheme, including split WLs and split cell VSS (CVSS), for 6T SRAM to maintain a compact cell area and improve HS cell stability during the read and write cycles without degrading  $I_{CELL}$  and WM. A segmented CVSS-strapping scheme is developed to suppress the ground bounce on the split-CVSS lines. The CVSS voltage for S6T can be generated by either a constant voltage source or a charge-sharing-based CVSS generation scheme. A 28-nm 256-kb DSC6T SRAM macro was fabricated and achieves a 280-mV lower VDDmin than a conventional 6T SRAM.

**Index Terms**—Low-voltage, read assist, SRAM, write assist.

## I. INTRODUCTION

INTELLIGENT wearable devices and the Internet of Things (IoT) require on-chip SRAM macros with a compact area to reduce costs, a low minimum supply voltage (VDDmin) to reduce power consumption, and sufficient speed to facilitate real-time computing. A typical 6T SRAM cell has a compact cell area, but suffers write failure and half-select (HS) disturbance in read and write cycles at a low supply voltage (VDD).

Previous studies, as shown in Table I, sought to improve the write margin (WM) of SRAMs by: 1) boosting the wordline (BWL) [8]–[11]; 2) lowering the cell-VDD (CVDD) voltage (CVDD-Down/CVDD-D); or 3) using negative-bitline (NBL) voltage. The BWL scheme boosts the WL to a voltage higher than VDD during the write operation and increases the strength of the pass gate (PG) of a 6T cell. However, this BWL scheme degrades the read stability for HS

Manuscript received July 8, 2016; revised December 30, 2016 and April 25, 2017; accepted April 26, 2017. Date of publication May 23, 2017; date of current version August 22, 2017. This paper was approved by Associate Editor Dejan Markovic. This work was supported by MOST of Taiwan. (Corresponding author: Meng-Fan Chang.)

M.-F. Chang and Y.-J. Chen are with National Tsing Hua University, Hsinchu 30013, Taiwan (e-mail: mfchang@ee.nthu.edu.tw).

C.-F. Chen was with National Tsing Hua University, Hsinchu 30013, Taiwan. He is now with the University of Wisconsin-Madison, Madison, WI USA.

T.-H. Chang, C.-C. Shuai, and Y.-Y. Wang are with United Microelectronics Corporation, Hsinchu 300, Taiwan.

H. Yamauchi is with the Fukuoka Institute of Technology, Fukuoka, Japan.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2701547

cells on the activated row. The CVDD-Down scheme lowers the CVDD ( $CVDD = VDD - \Delta V_{CVDD}$ ) by an amount of  $\Delta V_{CVDD}$  and weakens the pull-up PMOS (PU) for increasing WM, but causes degradation in the hold static noise margin (HSNM), since the cells share the same CVDD lines. The NBL scheme negatively boosts the BL to a voltage ( $V_{BL}$ ) lower than VSS ( $V_{BL} = VSS - \Delta V_{NBL}$ ) [30], [32]–[34] to achieve a cross-point write assist. However, the NBL scheme consumes significant area and power due to the inclusion of pumping capacitors ( $C_{NBL}$ ), particularly when the required  $V_{NBL}$  is large. Moreover,  $V_{NBL}$  should be as small as possible to avoid degrading the stability of inactivated cells on the select columns due to a weak turn ON of the NMOS PG in a 6T SRAM cell.

Previous studies improve the read and HS cell stability (HS-CS) of SRAMs by using the read-decoupled (RD) or WL voltage underdrive (WLUD) scheme. Previous RD schemes employ one (1T) [19] or two additional transistors (2T) [23], [25] as a separated read port to decouple the read path from the cell storage node. This RD scheme successfully increases the read static noise margin (RSNM) and suppresses the read/HS disturb. However, this happens at the expense of an additional layout area due to the increase of transistor counts. Moreover, previous RD cells do not solve the HS disturb issue in write operations. Previous WLUD schemes [7], [31]–[33] are commonly used in 6T SRAMs to improve the HS/RSNM during read/write cycles by applying a WL voltage ( $V_{WL} = VDD - \Delta V_{WLUD}$ ) lower than VDD by an amount of  $V_{WLUD}$ . By weakening the strength of the PG in a 6T SRAM cell, the read and HS disturb in both read and write cycles are effectively suppressed. However, the WLUD schemes tend to degrade the cell read current ( $I_{CELL}$ ) and result in slower read/cycle speeds. Moreover, the weaker strength of the PG degrades the WM, and necessitates an increase in  $\Delta V_{CVDD}$  or  $\Delta V_{NBL}$  ( $C_{NBL}$ ) in write-assist schemes. Unfortunately, the maximum  $\Delta V_{CVDD}$  is limited by the HSNM of inactive cells on the selected columns. A large  $C_{NBL}$  results in a large area and power overhead, particularly in macros with wide-IO and small amount of column multiplexing (Y-mux). Thus, the read/HS-CS tradeoffs in  $I_{CELL}$  and WM have not yet been solved for 6T cells.

In order to maintain a compact cell area and to improve HS-CS during read and write cycles without degrading  $I_{CELL}$

TABLE I  
FEATURES OF PREVIOUS ASSIST SCHEMES FOR SRAM

| Scheme    | Boost Wordline (Boost WL)              | Cell VDD Down (CVDD-D)                                                 | Negative Bitline (NBL)                 | Read Decouple 8T (RD8T)                                        | Wordline Under Drive (WLUD)          |
|-----------|----------------------------------------|------------------------------------------------------------------------|----------------------------------------|----------------------------------------------------------------|--------------------------------------|
| Schematic |                                        |                                                                        |                                        |                                                                |                                      |
| Pros      | Write Assist ( $\gamma$ -ratio change) | Write Assist ( $\gamma$ -ratio change)                                 | Write Assist ( $\gamma$ -ratio change) | Read Assist (decouple R/W path)                                | Read Assist ( $\beta$ -ratio change) |
| Cons      | Read SNM degradation (Row HS)          | 1. Hold SNM degradation (Column HS)<br>2. Do not solve read/HS disturb | 1. Area penalty<br>2. Power overhead   | 1. Large cell area<br>2. Do not solve HS issue in write cycles | Write margin degradation             |



Fig. 1. (a) Schematic and (b) layout of DSC6T SRAM cell.

and the WM, this paper proposes a dual-split-control (DSC) 6T SRAM.

This paper is organized as follows. Section II describes the proposed DSC6T cell. Section III describes the macro-structure and control circuits of the DSC6T cell. Section IV presents the comparison and experimental results. Section V concludes this paper.

## II. PROPOSED DUAL-SPLIT-CONTROL 6T SRAM

### A. DSC 6T Cell Structure

Fig. 1(a) shows the schematic of the proposed DSC6T cell comprised of two cross-couple inverters (INV1 and INV2), and

two NMOS access transistors (PG1 and PG2). The INV1 consists of a pull-up PMOS (PU1) and a pull-down NMOS (PD1). The INV2 consists of a pull-up PMOS (PU2) and a pull-down NMOS (PD2). The cell storage nodes are named as  $Q$  and  $QB$ . The DSC6T cell differs from conventional 6T SRAM in that the WL has been split into WL1 and WL2, and the cell-VSS (CVSS) line has also been split into CVSS1 and CVSS2.

Fig. 1(b) shows the layout of the DSC6T cell. The placement and the layout of transistors in DSC6T are the same as that found in conventional 6T SRAM cells. This enables the DSC6T to employ the same layout in front-end layers and in the lower metal connection as in conventional 6T SRAM.



Fig. 2. RV mode operation of a DSC6T. (a) Macro-configuration. (b) Waveform. (c) WL configuration table.

Similar as in a conventional 6T cell, the CVSS lines in DSC6T are split into two horizontal metal-3 lines, placed at the top and bottom edges of a cell. However, these two horizontal metal-3 CVSS lines in the DSC6T are not shorted by metal-4 meshing lines as in a conventional 6T cell. In the DSC6T, WL1 and WL2 are implemented in a metal-4 layer. Accordingly, the DSC6T may use the same footprint as foundry provided that the 6T SRAM cell using aggressive layout rules, with modifications in metal-3 and metal-4 layers. This enables the DSC6T to achieve a more compact cell area than other customized SRAM cells without using aggressive layout rules for SRAM.

It should be noted that the default design of foundry 28-nm 6T SRAM cells employs metal-3 (M3) for WL and VSS lines. The M3 CVSS lines are generally drawn in the row direction without vertical meshing. Ground bounce (GB)/noise on VSS lines is suppressed using M4 for vertical VSS meshing. Our use of M4 for split WLs in the DSC6T prevented the use of M4 for VSS meshing, thereby increasing the risk of GB/noise. This led us to develop the segmented CVSS-strapping (SVS) scheme to suppress GB, as discussed in Section III-A.

#### B. Read and Write Operations in a Regular VDD

Fig. 2 shows the macro-configuration, operation waveform, and configuration table of a DSC6T SRAM operated at a typical VDD. The test-assist-mode (TAM) is an input pin to enable the DSC6T macro to switch between two different operation modes: regular-voltage (RV) and low-voltage (LV) modes.

At the RV mode ( $TAM = 0$ ), the inputs of the WL drivers (WLDs) for WL1 and WL2 are shorted, while

the CVSS1 and CVSS2 lines are both connected to VSS. By applying the same timing control on WL and BL as in a conventional 6T, the read and write operations of the DSC6T are the same as in a conventional 6T. This enables the DSC6T to achieve the same high-speed operation as conventional 6T at a regular VDD.

At the LV mode ( $TAM = 1$ ), the split-WL and split-CVSS functions are enabled to provide read-assist and write-assist operations, as described in Sections II-C and II-D.

#### C. Write Operation in Low-Voltage Mode and Write-Assist Mechanism

Fig. 3 shows the macro-configuration and waveform of the write operation at LV mode for DSC6T. At low-VDD mode ( $TAM = 1$ ) for write operations, WL1 and WL2 are controlled separately, while CVSS1 and CVSS2 are held at different voltages during read and write operations. A write operation is divided into two sequential sub-phases within one clock cycle: write-0 (W0) and write-1 (W1). The input data (DIN) with data “0” ( $DIN = 0$ ) are written into selected cells in the W0 sub-phase. The input data (DIN) with data “1” ( $DIN = 1$ ) are written into selected cells in the W1 sub-phase. The W0 and W1 sub-phases occur during clock (CLK) high and low periods, respectively. WL2 is always activated after the end of WL1 to insure that these two write sub-phases would not overlap.

In the W0 sub-phase, WL1 is applied with a pulse with period  $T_{WL}$  and amplitude ( $V_{WL1}$ ) of VDD, while WL2 is kept at 0 V. In this sub-phase, CVSS1 is kept at 0 V while CVSS2 is being raised to a CVSS-assist voltage ( $V_{SSA}$ ).

For unselected columns in the W0 sub-phase, both BL and BLB are biased at VDD. Then, the 1-cell ( $Q = 1$ ) in



Fig. 3. Write operation of DSC6T. (a) Macro-level overview. (b) Write-0 (W0) operation. (c) Write-1 (W1) operation. (d) Write-assist mechanism.

an unselected (half-selected) column (HS1) does not suffer from the HS disturbance, because  $Q = BL = VDD$  and PG2 is OFF. The 0-cell ( $Q = 0$ ) in an unselect (half-selected)

column (HS0) faces a voltage bump ( $V_B$ ) at its  $Q$  node. With the same amount of  $V_B$ , the DSC6T has better cell stability (HS-SNM) than a conventional 6T thanks to the



Fig. 4. Read operation of DSC6T. (a) Macro-level overview. (b) Cell-level waveform. (c) Read-assist mechanism.

reduced strength of PD2 and elevated trip point ( $V_{TP2}$ ) of INV2 due to  $CVSS2 = V_{SSA}$ .

For selected columns in a W0 operation, as Fig. 3(b) shows, BL is pulled down to  $-V_{NBL}$  and BLB is kept at VDD. Then, the data “0” (DIN = 0) are written to the node  $Q$  in the selected cells in parallel. The  $CVSS2 = V_{SSA}$  behavior reduces the strength of PD2 and elevates the trip point ( $V_{TP2}$ ) of INV2 in the selected cell, as shown in Fig. 3(d). This weakened PD2 and raised  $V_{TP2}$  lead to more easy overwrite process for data at the  $QB$  node from VSS to VDD, and result in an improvement in the WM for DSC6T compared with processes without split-CVSS assist ( $CVSS1 = CVSS2 = 0$  V).

It should be noted that the conventional single-ended write operations, which apply VDD or 0 V on the same BL/PG for the write-1 or write-0 operation, suffer significant degradation in the WM compared with the differential BL write scheme, particularly for the write-1 operation. Unlike conventional single-ended write schemes, the DSC6T always applies LV (0 V or negative voltage) to one of the BLs/PGs while the other PG is OFF. This virtual single-ended write scheme has a higher (better) WM than the single-ended write scheme.

For a selected DSC6T cell,  $CVSS2 = V_{SSA}$  provides a greater improvement in the WM than does  $CVSS2 = 0$  V. Compared with the WLUD + NBL scheme, the proposed split-CVSS scheme improves the WM of DSC6T, and requires a much smaller  $V_{NBL}$ . This enables the DSC6T to use a smaller NBL capacitor ( $C_{NBL}$ ) to suppress the area and power overhead due to employing the NBL scheme.

In the W1 sub-phase, as Fig. 3(c) shows,  $WL2 = VDD$  and  $WL1 = 0$  V, while  $CVSS2 = 0$  V and  $CVSS1 = V_{SSA}$ . Then,  $BL = VDD$  and  $BLB = -V_{NBL}$  for selected columns while  $BL = BLB = VDD$  for unselected columns. The cell operation of W1 is, therefore, similar to W0.

Accordingly, the combination of split-WL and split-CVSS provides both write assist and improved HS-CS at the same time.

#### D. Read Operation in Low-Voltage Mode

Fig. 4 shows the macro-configuration and waveform of the read operation at LV mode for a DSC6T. At low-VDD ( $TAM = 1$ ) read mode, only WL1 is activated while WL2 is kept at 0 V to perform a single-end read scheme.  $CVSS1$  is



Fig. 5. Macro-structure of DSC6T with (a) CVS and (b) CSVG schemes for  $V_{SSA}$  control.

kept at 0 V while  $CVSS_2$  is raised to  $V_{SSA}$ . Both BL and BLB are precharged to high. Unlike the write operation, the pulselength ( $T_{WL-R}$ ) of WL1 exceeds the clock high period, and there is only one phase in the read operation.

Fig. 4(b) shows the waveform for a cell in either selected or unselected column at the low-VDD read mode. When the accessed SRAM cell stores data-1 (1-cell), the BL is kept at high, while this 1-cell does not suffer read disturbance caused by the BL/BLB, because its PG2 is OFF ( $WL_2 = 0$ ). This makes the RSNM of 1-cell, which is slightly lower than the HSNM due to  $CVSS_2 = V_{SSA}$ , to be much higher than the RSNM of conventional 6T SRAM cell.

When the accessed SRAM cell stores data-0 (0-cell), the BL is discharged by the cell current ( $I_{CELL}$ ) to develop a BL voltage swing ( $V_{BLS}$ ) during  $T_{WL-R}$ . As Fig. 4(c) shows,  $CVSS_2 = V_{SSA}$  reduces the strength of PD2 and elevates the trip point ( $V_{TP2}$ ) of INV2 during the WL1 pulse period. This leads to the fact that a DSC6T cell is harder to flip, and can tolerate a higher voltage bump ( $V_B$ ) at the cell storage node  $Q$  compared with a conventional 6T SRAM. Accordingly, a DSC6T achieves higher cell stability (or RSNM) for selected and half-selected cells than a conventional 6T SRAM because of  $CVSS_2 = V_{SSA}$ .

The improved SNM with  $V_{WL} = VDD$  enables DSC6T cells to achieve low read-VDDmin with no degradation in  $I_{CELL}$  due to lower  $V_{WL}$  as would otherwise be the case in WLUD schemes. The single-ended sensing at  $V_{WL} = VDD$  enables much larger  $I_{CELL}$  and faster read speed than a differential read scheme with the WL-underdrive (WLUD) scheme when VDD is low (near threshold voltage). In this experiment, the employed sense amplifier [28] is unable to track BL swings across a wide VDD range. In the future, we will investigate an optimized sensing scheme for the proposed SRAM over a wide VDD range.

### III. MACRO-STRUCTURE OF DSC6T SRAM

Fig. 5 shows the macro-structure of the proposed DSC6T SRAM. To suppress the GB on the split-CVSS lines while remaining suitable for either the external-source or the charge-sharing-based CVSS generation (CSVG) scheme, an SVS scheme is developed for the DSC6T. The CVSS voltage ( $V_{SSA}$ ) for the DSC6T can be provided by either a constant voltage source (CVS) or a build-in CSVG scheme. The CVS scheme provides simple operation and consumes small area overhead. The CSVG scheme is proposed to achieve lower power consumption than the CVS scheme. The operations of CVS, CSVG, and SVS schemes are described in the following.

#### A. Segmented CVSS-Strapping Scheme

Fig. 6 presents the SVS scheme for the DSC6T SRAM. To suppress GB at CVSS lines, every  $p$  CVSS1/CVSS2 line in the cell array is shorted. The SVS scheme employs two types of CVSS meshing cells: the array-edge mesh (AEM) and the intra-array mesh (IAM).

Each AEM cell includes two pairs of NMOS switches [VSS-SW and global CVSS (GCVSS-SW)] to connect CVSS1/CVSS2 to VSS or to the GCVSS voltage line. The GCVSS can either be connected to an external voltage source, output from CVS, or the output from the build-in CSVG circuit. The CVSS1/CVSS2 lines of  $p$  rows are vertically shorted together within an EM cell, which has the same height of  $q$  DSC6T cells.

An IAM cell is placed between every  $q$  regular cells in a row. Each IAM cell provides two vertical metal-2 lines for CVSS1 and CVSS2 meshing across the  $p$  rows. To provide a better proximity effect in the cell array, the IAM cell employs the same topology and layout (but with different via12/via23 patterns) as that found in a regular DSC6T



Fig. 6. (a) Array structure of a ground-strapping scheme in a DSC6T. Schematic and layout of (b) AEM and (c) IAM cells.

SRAM cell. The two vertical metal-2 meshing lines in an IAM cell were originally used as BL/BLB in regular DSC6T cells. The IAM cell adds an additional via23 to connect the two vertical metal-2 meshing lines to horizontal metal-3 CVSS1/CVSS2 lines. In an IAM cell, the via12 between metal-2 and the metal-1 island of the drain terminal of PG1/PG2 is removed to prevent voltage disturb in the CVSS1/CVSS2 from node  $Q/QB$  when WL1/WL2 is ON. The layout of the IAM cell is presented in Fig. 6(c).

If the DSC6T macro uses a CVS for CVSS, the value of  $p$  could be as large as the number of rows of a cell array. If the DSC6T macro uses CSVG for CVSS, the value of  $p$  is selected based on the target ratio between  $C_{CVSS}$  and  $C_{RCVSS}$ .

#### B. CVSS Voltage ( $V_{SSA}$ ) Control Schemes

1) *Constant Voltage Source Scheme*: Fig. 7 presents two approaches to provide CVS for the  $V_{SSA}$ . It should be noted that the current consumption for the CVSS lines switching between VSS and  $V_{SSA}$  is not significant compared with the regular read and write operation of an SRAM macro. In many state-of-the-art energy efficient chips, there is an on-chip voltage regulator (VR) with numerous output voltages for multiple voltage domains within a chip. Accordingly, adding one output voltage with small current load does not cause significant power and area overhead for a VR.

Fig. 7(b) presents a typical voltage-divider-based voltage generator (VDVG) as the build-in CVS solution if an on-chip

VR is not available. In a standby mode, the VDVG is disabled to suppress power consumption. In an active mode,  $N1$ ,  $N2$ , and  $N3$  form a voltage divider and generate the  $V_{SSA}$  voltage.

2) *Charge-Sharing-Based CVSS Generation Scheme*: To achieve lower power consumption than the DSC6T using constant voltage scheme, this paper proposes a CSVG scheme for  $V_{SSA}$  generation.

a) *Structure and challenges of CSVG*: Fig. 8 presents the structure and operation waveform of the proposed CSVG scheme. The CSVG comprises of two sets of replica rows (RRS1 and RRS2), replica-CVSS switches (RSW), a replica-CVSS precharge controller (RPC), and a data-pattern detector (DPD). Each RRS consists of  $p$  replica rows (RRs) of replica DSC6T cells (ROC). In this paper, we assign  $p = 3$ . All the replica cells in RR[0] and RR[1] store data-0 (0-cell). All the replica cells in RR[2] store data-1 (1-cell). The selected CVSS line in RR is noted as RCVSS. The parasitic load on the CVSS1/CVSS2 ( $C_{CVSS}$ ) depends on the data pattern of the cells of the accessed row. If a DSC6T cell stores data 0 (0-cell), only the parasitic load at the source terminal of PD2 is connected to CVSS2. If a DSC6T cell stores data 1 (1-cell), both the parasitic loads at the node  $QB$  and at the source terminal of PD2 are connected to CVSS2. The parasitic loads at node  $QB$  include the gate capacitance of INV1, the drain capacitance of PU2, the drain capacitance of PG2, and the drain capacitance of PD2. Accordingly, the row in which all cells are 0-cells has the minimum  $C_{CVSS}$  ( $C_{CVSS\_MIN}$ ),



Fig. 7. Two common CVS schemes for  $V_{SSA}$ . (a) Using on-chip VR. (b) Build-in VDVG.

while the row in which all cells are 1-cells has the maximum  $C_{CVSS}$  ( $C_{CVSS\_MAX}$ ).

*b) Operation of CSVG:* To consider the pattern-dependent  $C_{CVSS}$  on accessed CVSS1/CVSS2 lines, the proposed CSVG scheme includes three sub-phases. In brief, the first phase generates a baseline voltage on the CVSS to avoid  $V_{SSA}$  being too high when the accessed CVSS has the minimum parasitic load. The second phase increases the CVSS voltage according to the parasitic load of the accessed CVSS lines. The third phase resets the RSS to initial the state of RRS. The detailed operations of these three phases are described as follows.

In phase-1, the RSWEN[0] connects the RCVSS (RCVSS[0]) of RRS1 to the GCVSS line, and is then connected to selected CVSS lines through AEM in a regular cell array. Then, the charge stored on the parasitic capacitor of RCVSS[1] ( $C_{RCVSS1}$ ) is shared with the selected  $C_{CVSS}$  to generate a voltage ( $V_{SSA1}$ ) on CVSS2/CVSS1. By properly designing the ratio between  $C_{RCVSS1}$  and  $C_{CVSS\_MIN}$ ,  $V_{SSA1}$  does not exceed the upper limit of  $V_{SSA}$  ( $V_{SSA\_MAX}$ ).

In phase-2, the DPD detects the amount of  $V_{SSA1}$ , which depends on the accessed  $C_{CVSS}$ , and then outputs a set of digital code (RSWEN[2:1]) to control RSW[2] and RSW[1]. The enabled RSW connects additional RCVSS lines to selected CVSS according to the RSWEN[2:1]. When the accessed CVSS line has a small  $C_{CVSS}$ , as is shown in Fig. 8(b), then  $V_{SSA1}$  is high and near the target  $V_{SSA}$ , RSWEN[2:1] = (0, 0) and both RSW2 and RSW1 remain OFF. When the accessed CVSS line has a large  $C_{CVSS}$ , as is shown in Fig. 8(c), then  $V_{SSA1}$  is low, RSWEN[2:1] = (1, 1) and both RSW[2] and RSW[1] are turned ON to connect all RCVSS lines to GCVSS. When the accessed CVSS line has a medium-low  $C_{CVSS}$ , as is shown in Fig. 8(d), then  $V_{SSA1}$  is medium-high, RSWEN[2:1] = (0, 1) and only RSW[1] is turned ON to connect RCVSS[1] to GCVSS. The selected RCVSS lines then provide additional charge, and raise the voltage on CVSS from  $V_{SSA1}$  to  $V_{SSA2}$ .

In phase3, the RPC precharges the RCVSS lines in selected RRS to a target voltage  $V_{RCVSS\_PRE}$ . In this paper,  $V_{RCVSS\_PRE}$  is equal to VDD. This precharge operation of RRS1 and RRS2 occurs sequentially within one clock cycle. When the clock is low, then RRS1 performs the precharge operation while RRS2 performs the charge-sharing (phase-1 and phase-2) operations. When the clock is high, RRS2 performs the precharge operation while RRS1 performs the charge-sharing operations. Accordingly, by hiding the phase-3 in the inactive period of the corresponding RRS, there is no timing penalty in the read/write operation caused by this precharge operation.

#### IV. PERFORMANCE AND MEASUREMENT RESULT

##### A. Performance

Fig. 9 presents  $V_{SSA}$  versus HSNM of neighboring cells and RSNM of selected cells at a given VDD for the DSC6T cell. As  $V_{SSA}$  increases, RSNM increases while HSNM decreases. Accordingly, when  $V_{SSA}$  is below a value ( $V_{SSA\_MAX}$ ) that is causing hold failure, a higher value may help to suppress the HS disturb issue for both read and write operations.

Fig. 10 shows VDD versus HSNM and RSNM at three different  $V_{SSA}$  values. With proper selection of  $V_{SSA}$ , DSC6T may trade the HSNM to improve the RSNM for lowering VDDmin.

Fig. 11 shows the worst case WM of DSC6T, conventional 6T with and without the WLUD scheme. This analysis is performed at the slow-NMOS-fast-PMOS global corner with 32 000 samples using Monte Carlo simulation. Because of the pseudo-single-end write operation, the WM of DSC6T with  $V_{SSA} = 100$  mV is slightly smaller than the conventional 6T SRAM without WLUD when VDD is high. The difference in the WM between DSC6T and conventional 6T without WLUD would be smaller when operating in a low VDD mode. Compared with the conventional 6T with the WLUD scheme, the DSC6T achieves higher WM as VDD decreases mainly due to its higher WL voltage. This improved WM enables the



Fig. 8. (a) Schematic of the CSVG scheme. The waveform of CSVG with (b) small  $C_{CVSS}$ , (c) large  $C_{CVSS}$ , and (d) medium-low  $C_{CVSS}$ .

DSC6T to employ a smaller  $V_{NBL}$  and consume a smaller area in the NBL capacitor for a target WM.

Fig. 12 compares the read access time between DSC 6T and a conventional 6T with the WLUD scheme. This analysis is

performed at the slow-NMOS-slow-PMOS global corner with 32 000 samples using Monte Carlo simulation. Due to the single-ended read operation, in a low-VDD mode, the DSC requires a voltage swing double in size to that of differential

Fig. 9. HSNM and RSNM versus  $V_{SSA}$ .

Fig. 10. HSNM and RSNM versus VDD.



Fig. 11. WM versus VDD.

read operations. Nevertheless, the read speed of DSC6T is still 2.6 times faster than that of the WLUD scheme thanks to the higher WL voltage and larger  $I_{CELL}$ . Moreover, DSC6T can further improve its read speed by using advanced single-

ended sense amplifier, such as the data-aware self-reference scheme [28]. With proper write assist (the NBL scheme), the write times are much shorter than the single-ended read delay; therefore, two-phase write operations do not limit the macro-speed.

Fig. 13 presents the simulated GB versus the number of strapped CVSS lines in the SVS scheme. In a 256-kb macro, the GB can be suppressed to below 22 mV at a high VDD ( $=0.9$  V) with 5% area overhead. At a low-VDD operation, which is preferred by many wearable and IoT devices, the small ( $I_{CELL}$ ) of a 6T SRAM cell is far smaller than that of high-VDD, and makes the amount of GB ( $V_{GB}$ ) on CVSS insignificant. The SVS scheme achieves sub-10-mV  $V_{GB}$  with only 2.5% area overhead for a 256-kb SRAM macro. At a regular VDD, due to having the same operation as a conventional 6T, the DSC6T without assist consumes almost the same power for read and write operations as a conventional 6T SRAM without assist.

The DSC6T employs two WLs to control the PGs of a 6T SRAM. Thus, each of the WLs has only half of the load (gate capacitance) of conventional 6T. The two metal lines used for WL1 and WL2 increase the plate area of the WL; however, the usage of high-level metal layer reduces WL to device-layer parasitic capacitance. This lowers the parasitic capacitance of a single WL1/WL2 in DSC6T to slightly below that of conventional 6T SRAM cells. Thus, as Fig. 14 shows, the overall WL parasitic capacitance of DSC6T for write operations (access of WL1 and WL2) is 53% higher than that of conventional 6T. Fortunately, the WLD does not consume significant power in a write operation. The overall WL parasitic capacitance for read operations (access of WL1 only) is 24% lower than that of conventional 6T.

Each of the split WLs has lower parasitic load than do the WLs in conventional 6T SRAMs. Thus, our DSC6T is not prone to the degradation in WL rising/falling times found in conventional 6T for various array sizes. For large arrays with long WLs, the rising/falling times in the WL signal of DSC6T are faster than those of conventional 6T. However, as the length of the WL increases, the parasitic load and read-current load on CVSS lines also increase. This extends the time required for CVSS voltage stabilization and increases GB. Thus, larger transistors are required for AEM and more IAM cells (larger  $q$ ) are needed for cell arrays with long WLs.

Fig. 15(a) presents the normalized energy consumption of the read operation for various read-assist schemes with the same HS-CS at a given low VDD (0.6 V). Many WLUD schemes employ local voltage dividers or voltage clamps in WLD for required WL voltage ( $V_{WL}$ ). This happens at the expense of extra overhead in power consumption due to the dc current for generating  $\Delta V_{WLUD}$ . Moreover, the lengthened WL pulse period ( $T_{WL}$ ) due to degraded  $I_{CELL}$  and WM caused by the lower  $V_{WL}$  further increases the power overhead for the WLUD scheme. Thanks to shorter cycle time and less dc-short period, the DSC6T with VDVG consumes 54% less energy than WLUD. Because the power overhead caused by CSVG scheme is much smaller than that of the WLUD scheme, the DSC6T-DS consumes 74% less power than the WLUD scheme.



Fig. 12. Speed comparison. (a) Conceptual waveform. (b) Normalized macro-access time.



Fig. 13. Performance of SVS. (a) CVSS bounce versus number ( $q$ ) of strapped CVSS lines. (b) CVSS bounce versus area penalty.



Fig. 14. Parasitic load on WLS in conventional 6T and DSC6T.

Fig. 15(b) presents the normalized energy consumption of the write operation for various assist schemes with the same HS-CS and WM at a given low VDD (0.6 V). Thanks to

higher  $V_{WL}$  and better WM, the DSC6T requires small  $C_{NBL}$  and smaller power overhead for NBL-assist operation than the WLUD scheme did. Thus, DSC6T-VDVG and DSC6T-CSVG consume 52% and 76% less write energy than WLUD did, respectively.

Fig. 16 shows simulated dynamic and standby energy for read and write operations across VDD. At room temperature, dynamic power dominated the energy consumption of read and write operations. At a high temperature, leakage current is significant.

The single-ended sensing behavior makes the DSC6T consume much less read power for reading a 1-cell than that for reading a 0-cell. Thus, the inclusion of an application-dependent data-inverting (ADDI) scheme, which ensures that the number of 1-cells exceeds the number of 0-cells in the cell array by using data inversion, reduces the read power consumption of the DSC6T to below that of a conventional differential sensing scheme. ADDI can be implemented in SRAM macros or the data-bus controller. Fig. 17 shows the difference in power consumption between all-1 and all-0 data patterns in the read cycle of SRAM macros (BL length = 256 b) using various numbers of IO pins. As the number of



Fig. 15. Normalized power consumption of (a) read and (b) write operations for DSC6T and conventional WLUD schemes.



Fig. 16. Simulated dynamic and standby energy for read and write operations across VDD.

IO pins increased, the benefits of using ADDI in reducing the read power increased. In an SRAM macro with 64-b IO and 256 cells per BL, the read power consumed using the all-1 pattern was 48% less than that of the all-0 pattern.

Table II presents a comparison of the proposed DSC6T and previous assist schemes. In this comparison, all of the assist schemes were implemented using the same technology

nodes and under the same conditions: 16-b IO 256-kb SRAM macro with 256 rows and 128 columns as a sub-array; VDD = 0.5 V with TT corner at room temperature. Moreover, the same assist voltages were applied for all of the assist schemes:  $\Delta V_{WLUD} = 100$  mV,  $\Delta V_{NBL} = 100$  mV,  $V_{SSA} = 100$  mV, and  $\Delta V_{CVDD} = 100$  mV. It should be noted that the values presented in Table II would no doubt vary if the technol-

TABLE II  
COMPARISON TABLE

| Assist Types & Cell Structure            | Conv. 6T | WLUD+NBL                               | WLUD+CVDD-                                                | DSC+NBL* <sup>3</sup><br>(This work) |
|------------------------------------------|----------|----------------------------------------|-----------------------------------------------------------|--------------------------------------|
| Cell size                                | 1x       | 1x                                     | 1x                                                        | 1x                                   |
| Half-Select (HS)/Read Assist             | No       | WLUD                                   | WLUD                                                      | S-WL+S-CVSS                          |
| Write Assist                             | No       | NBL                                    | CVDD-Down                                                 | S-CVSS + NBL-light                   |
| Margin Trade-off                         | N/A      | $I_{CELL}$ vs. HS-SNM<br>HS-SNM vs. WM | $I_{CELL}$ vs. HS-SNM<br>HS-SNM vs. WM<br>Hold-SNM vs. WM | HS-SNM vs. Hold-SNM                  |
| $I_{CELL} \times HS\text{-SNM}^{\ast 1}$ | N/A      | 1x                                     | 1x                                                        | <b>3.5x</b>                          |
| Macro Read Delay                         | Fail     | 2.6x                                   | 2.6x                                                      | <b>1x</b>                            |
| Macro-Area * <sup>1</sup>                | 1x       | 1.17x                                  | 1.21x                                                     | <b>1.08x</b>                         |
| Read Energy ( $E_R$ ) * <sup>1, 2</sup>  | Fail     | 4.01x                                  | 4.01x                                                     | <b>1x</b>                            |
| Write Energy ( $E_W$ ) * <sup>1</sup>    | Fail     | 3.7x                                   | 4.5x                                                      | <b>1x</b>                            |

\*<sup>1</sup> at VDD=0.5V, TT corner, 25°C, Ymux=8, 256Kb macro;

\*<sup>2</sup> Data-pattern: 50% 1-cells and 50% 0-cells

\*<sup>3</sup> Using VDVG

$\Delta V_{WLUD} = 100\text{mV}$ ,  $\Delta V_{NBL} = 100\text{mV}$ ,  $V_{SSA} = 100\text{mV}$ ,  $\Delta V_{CVDD} = 100\text{mV}$



Fig. 17. Difference in read power consumption between all-1 and all-0 data patterns.

ogy node, circuit style, and/or simulation conditions were altered.

### B. Measured Results

To confirm the proposed concept, we designed a test-chip with a 256 row 256-kb DSC6T SRAM macro based on foundry's 28-nm 0.127- $\mu\text{m}^2$  6T SRAM cell. The strapping options employed in the testchip were  $p = 8$  and  $q = 4$ . The testchip features the assist enable test-mode TAM. When

TAM = 0, the function of DSC (Split-WL/Split-CVSS) is disabled in order to emulate the behavior of a nominal 6T SRAM. At TAM = 0, CVSS1 and CVSS2 of DSC6T macro are connected to the internal VSS, while WL1 is connected to WL2 using conventional timing control for the WL. When TAM = 1, this SRAM macro is in the DSC mode. The testchip has the test mode (CVSS\_TM) for an external  $V_{SSA}$  source and the CSVG scheme. At CVSS\_TM = 0, the GCVSS of the DSC6T macro is connected to the CVSS\_External pin, so that the external power supply can provide the testchip with various  $V_{SSA}$  values. At CVSS\_TM = 1, the GCVSS of the DSC6T macro is connected to the internal CSVG block. Fig. 18 shows the die photo and the structure of the two testchips.

Fig. 19 shows the measured shmoos plot of the 256-kb DSC6T SRAM macro to explore how  $V_{SSA}$  affects VDDmin. In this experiment, CVSS\_TM = 0 was set up to use CVSS\_External to provide various  $V_{SSA}$  values. At TAM = 0, our emulation of conventional 6T SRAM had a VDDmin = 860 mV during read and write operations. At TAM = 1, an increase in the  $V_{SSA}$  value lowered VDDmin of the DSC6T macro (e.g., VDDmin = 580 mV at  $V_{SSA} = 140$  mV). These shmoos tests reveal that the proposed DSC6T (TAM = 1) achieves VDDmin 280 mV lower than that of a conventional 6T SRAM (TAM = 0) when  $V_{SSA}$  exceeds 140 mV.

Fig. 20 shows the captured waveform to explore the read access time ( $T_{AC}$ ) at typical and VDDmin voltages. In this test, a built-in D flip-flop timing extracting scheme [35], [36]



Fig. 18. (a) Die photos. (b) Structures of testchip.



Fig. 19. Shmoo plot (a) with no CVSS assist and (b) with an external CVSS source.

was used to exclude the path delay ( $T_{PD}$ ) between the macro and the testers from the chip-level access time ( $T_{AC-CHIP}$ ). In this experiment,  $CVSS\_TM = 0$  was used to apply a precise  $V_{SSA}$  value (100 mV) for the extraction of access times. At  $VDD = 0.9$  V, the 256-kb DSC6T achieves  $T_{AC} =$

0.6 ns. At  $VDD = 0.58$  V, the DSC6T macro achieved  $T_{AC} = 2.2$  ns.

Fig. 21 presents the measured VDDmin improvement between different  $V_{SSA}$  generation schemes. In this experiment,  $CVSS\_TM = 0$  and  $CVSS\_TM = 1$  were both used for comparisons of VDDmin. For the  $CVSS\_TM = 0$



Fig. 20. Measured waveform at (a)  $V_{DD} = 0.9$  V and (b)  $V_{DD} = 0.58$  V.



Fig. 21. Measured VDDmin improvement between different  $V_{SSA}$  generation schemes.



Fig. 22. Measured standby current across various VDD.

mode, we applied an external voltage source to emulate  $V_{SSA}$  generated by VDVG. For the CVSS\_TM = 1 mode (using internal CSVG), we applied three different data patterns to the SRAM array: all-0, all-1, and half pattern. The term “half pattern” means that half of the data on a row is data “0” while the other half is data “1.” The CVS scheme enables the DSC6T to achieve the lowest VDDmin. In the CSVG scheme, the DSC6T suffers minor degradation in VDDmin improve-

ment due to data-pattern-dependent fluctuation in  $V_{SSA}$ . When the SRAM stored the all-0 pattern, the parasitic load on the CVSS2 lines was large and  $V_{SSA}$  generated by CSVG was smaller than  $V_{SSA}$  generated by storing the half pattern. As a result, VDDmin for the all-0 pattern was 10 mV higher than the VDDmin values of the other two patterns. Nonetheless, the VDDmin improvement provided by CSVG was reduced by no more than 20 mV compared with that of using VDVG.

Fig. 22 shows the standby current measured from the testchip. The standby current of DSC6T macro reduces as VDD decreases. At  $V_{DD} = 0.6$  V, the standby current of a 256-kb DSC6T macro is 7.8 times smaller than that at  $V_{DD} = 0.9$  V.

## V. CONCLUSION

This paper proposes a DSC6T SRAM cell, including split WL and split CVSS schemes, which maintains a compact cell area and improves HS-CS during the read and write cycles without degrading the read speed and WM. Two alternative split-CVSS control schemes, CVS or build-in CSVG, are developed to control the CVSS voltage for the DSC6T. An SVS scheme is developed to suppress the GB on the split CVSS. A 28-nm 256-kb DSC6T SRAM macro was fabricated and confirms the proposed concepts. The 256-kb DSC6T macro achieves a 280-mV improvement in VDDmin than 6T SRAM and 2.2-ns read access time at  $V_{DD} = 0.58$  V.

## ACKNOWLEDGMENT

The authors would like to thank C.-L. Hou and S.-C. Lin (Flash) for their help. They would also like to thank Chip Implementation Center, United Microelectronics Corporation, and Ministry of Science and Technology for their supports.

## REFERENCES

- [1] M. Yamaoka, “Low-power embedded SRAM modules with expanded margins for writing,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2005, pp. 480–611.
- [2] K. Zhang *et al.*, “A 3-GHz 70-mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply,” *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 146–151, Jan. 2006.
- [3] H. Pilo *et al.*, “An SRAM design in 65 nm and 45 nm technology nodes featuring read and write-assist circuits to expand operating voltage,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2006, pp. 15–16.
- [4] K. Nii *et al.*, “A 45-nm bulk CMOS embedded SRAM with improved immunity against process and temperature variations,” *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 180–191, Jan. 2008.

- [5] S. Ohbayashi *et al.*, “A 65-nm SoC embedded 6T-SRAM designed for manufacturability with read and write operation stabilizing circuits,” *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 820–829, Apr. 2007.
- [6] K. Sohn *et al.*, “A 100 nm double-stacked 500 MHz 72 Mb separate-I/O synchronous SRAM with automatic cell-bias scheme and adaptive block redundancy,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2008, pp. 386–622.
- [7] O. Hirabayashi *et al.*, “A process-variation-tolerant dual-power-supply SRAM with  $0.179 \mu\text{m}^2$  cell in 40 nm CMOS using level-programmable wordline driver,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2009, pp. 458–459.
- [8] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, “A stable 2-port SRAM cell design against simultaneously read/write-disturbed accesses,” *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2109–2119, Sep. 2008.
- [9] Y. Morita *et al.*, “An area-conscious low-voltage-oriented 8T-SRAM design under DVS environment,” in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2007, pp. 256–257.
- [10] N. Verma and A. P. Chandrakasan, “A 65 nm 8T sub-V<sub>t</sub> SRAM employing sense-amplifier redundancy,” in *ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 328–606.
- [11] I. J. Chang, J. J. Kim, S. P. Park, and K. Roy, “A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Sep. 2008, pp. 388–622.
- [12] Y. H. Chen *et al.*, “A 0.6V 45 nm adaptive dual-rail SRAM compiler circuit design for lower V<sub>DD\_min</sub> VLSIs,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2008, pp. 210–211.
- [13] M. Khellah *et al.*, “A 4.2 GHz  $0.3 \mu\text{m}^2$  256 kb dual-V<sub>cc</sub> SRAM building block in 65 nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2006, pp. 2572–2581.
- [14] M. Yamaoka, K. Osada, and K. Ishibashi, “0.4-V logic-library-friendly SRAM array using rectangular-diffusion cell and delta-boosted-array voltage scheme,” *IEEE J. Solid-State Circuits*, vol. 39, no. 6, pp. 934–940, Jun. 2004.
- [15] N. Shibata, H. Kiya, S. Kurita, H. Okamoto, M. Tan'no, and T. Douseki, “A 0.5-V 25-MHz 1-mW 256-kb MTCMOS/SOI SRAM for solar-power-operated portable personal digital equipment—Sure write operation by using step-down negatively overdriven bitline scheme,” *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 728–742, Mar. 2006.
- [16] D. P. Wang *et al.*, “A 45 nm dual-port SRAM with write and read capability enhancement at low voltage,” in *Proc. IEEE Int. SOC Conf.*, Sep. 2007, pp. 211–214.
- [17] M. Khellah *et al.*, “Wordline & bitline pulsing schemes for improving SRAM cell stability in low-V<sub>cc</sub> 65 nm CMOS designs,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2006, pp. 9–10.
- [18] K. Kushida *et al.*, “A 0.7 V single-supply SRAM with 0.495  $\mu\text{m}^2$  cell in 65 nm technology utilizing self-write-back sense amplifier and cascaded bit line scheme,” *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1192–1198, Apr. 2009.
- [19] M.-B. Chen *et al.*, “A 260 mV L-shaped 7T SRAM with bit-line (BL) swing expansion schemes based on boosted BL, asymmetric-VTH read-port, and offset cell VDD biasing techniques,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2012, pp. 112–113.
- [20] L. Chang *et al.*, “Stable SRAM cell design for the 32 nm node and beyond,” in *IEEE Symp. VLSI Technol., Dig. Tech. Papers*, Jun. 2005, pp. 128–129.
- [21] S. Ishikura *et al.*, “A 45 nm 2-port 8T-SRAM using hierarchical replica bitline technique with immunity from simultaneous R/W access issues,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2007, pp. 254–255.
- [22] R. Joshi *et al.*, “6.6+ GHz low V<sub>min</sub>, read and half select disturb-free 1.2 Mb SRAM,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2007, pp. 250–251.
- [23] L. Chang *et al.*, “An 8T-SRAM for variability tolerance and low-voltage operation in high-performance caches,” *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 956–963, Apr. 2008.
- [24] T.-H. Kim, J. Liu, and C. H. Kim, “A voltage scalable 0.26 V, 64 kb 8T SRAM with V<sub>min</sub> lowering techniques and deep sleep mode,” *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1785–1795, Jun. 2009.
- [25] J.-J. Wu *et al.*, “A large  $\sigma V_{TH}/VDD$  tolerant zigzag 8T SRAM with area-efficient decoupled differential sensing and fast write-back scheme,” *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 815–827, Apr. 2011.
- [26] M. E. Sinangil, N. Verma, and A. P. Chandrakasan, “A 45 nm 0.5 V 8T column-interleaved SRAM with on-chip reference selection loop for sense-amplifier,” in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Nov. 2009, pp. 225–228.
- [27] M.-H. Tu *et al.*, “A single-ended disturb-free 9T subthreshold SRAM with cross-point data-aware write word-line structure, negative bit-line, and adaptive read operation timing tracing,” *IEEE J. Solid-State Circuits*, vol. 47, no. 6, pp. 1469–1482, Jun. 2012.
- [28] S.-M. Yang, M.-F. Chang, C.-C. Chiang, M.-P. Chen, and H. Yamauchi, “Low-voltage embedded NAND-ROM macros using data-aware sensing reference scheme for VDDmin, speed and power improvement,” *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 611–623, Feb. 2013.
- [29] M. F. Chang *et al.*, “A high-speed 7.2-ns read-write random access 4-Mb embedded resistive RAM (ReRAM) macro using process-variation-tolerant current-mode read schemes,” *IEEE J. Solid-State Circuits*, vol. 48, no. 3, pp. 878–891, Mar. 2013.
- [30] Y.-H. Chen *et al.*, “A 16 nm 128 Mb SRAM in high- $\kappa$  metal-gate FinFET technology with write-assist circuitry for low-V<sub>MIN</sub> applications,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 238–239.
- [31] E. Karl *et al.*, “A 4.6 GHz 162 Mb SRAM design in 22 nm tri-gate CMOS technology with integrated active V<sub>MIN</sub>-enhancing assist circuitry,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 230–232.
- [32] T. Song *et al.*, “A 14 nm FinFET 128 Mb SRAM with V<sub>MIN</sub> enhancement techniques for low-power applications,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 232–233.
- [33] J. Chang *et al.*, “20 nm 112 Mb SRAM in high- $\kappa$  metal-gate with assist circuitry for low-leakage and low-V<sub>MIN</sub> applications,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 316–317.
- [34] H. Pilo *et al.*, “A 64 Mb SRAM in 32 nm high- $\kappa$  metal-gate SOI technology with 0.7V operation enabled by stability, write-ability and read-ability enhancements,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 254–256.
- [35] S.-S. Sheu *et al.*, “A 4 Mb embedded SLC resistive-RAM macro with 7.2 ns read-write random-access time and 160 ns MLC-access capability,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 200–201.
- [36] M.-F. Chang *et al.*, “A 0.5 V 4 Mb logic-process compatible embedded resistive RAM (ReRAM) in 65 nm CMOS using low-voltage current-mode sensing scheme with 45 ns random read time,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 434–435.



**Meng-Fan Chang** (M'05–SM'14) received the M.S. degree from Pennsylvania State University, University Park, PA, USA, and the Ph.D. degree from National Chiao Tung University, Hsinchu, Taiwan.

He has industry over ten years. From 1996 to 1997, he was with Mentor Graphics, NJ, USA, where he designed memory compilers. From 1997 to 2001, he was with the Design Service Division, Taiwan Semiconductor Manufacturing Company, Hsinchu, where he designed embedded SRAMs and Flash. From 2001 to 2006, he was a Co-Founder and the Director of IPLib Company, where he developed embedded SRAM and ROM compilers, Flash macros, and flat-cell ROM products. He is currently a Full Professor with National Tsing Hua University (NTHU), Hsinchu. His current research interests include circuit designs for volatile and nonvolatile memory, ultralow-voltage systems, 3-D memory, circuit-device interactions, memristor logics for neuromorphic computing, and computing-in-memory.

Dr. Chang received the Academia Sinica (Taiwan) Junior Research Investigators Award in 2012 and the Ta-You Wu Memorial Award of National Science Council (Taiwan) in 2011. He also received numerous awards from the Taiwan's National Chip Implementation Center, NTHU, MXIC Golden Silicon Awards, and ITRI. He is the Corresponding Author of numerous International Solid-State Circuits Conference (ISSCC), Symposium on VLSI Circuits, International Electron Devices Meeting (IEDM), and DAC papers. He has been serving on Technical Program Committees for ISSCC, IEDM, IEEE Asia Solid-State Circuits Conference, International Symposium on Circuits and Systems, International Symposium on Design Automation and Testing, and numerous international conferences. He has been a Distinguished Lecture Speaker for the IEEE Circuits and Systems Society. He has been serving as the Associate Executive Director for the Taiwan's National Program of Intelligent Electronics from 2011 to 2016. He is an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, and IEICE Electronics.



**Chien-Fu Chen** received the B.S. and M.S. degrees in electrical engineering from National Tsing Hua University in 2009 and 2013, respectively. He is currently pursuing the Ph.D. degree with the University of Wisconsin-Madison, Madison, WI, USA.

His current research interests include low-voltage VLSI design, machine learning, and parallel computing.



**Yi-Ju Chen** received the B.S. degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2015, where he is currently pursuing the M.S. degree in electrical engineering.

His current research interests include circuit design of embedded SRAM.



**Ting-Hao Chang** received the B.S. degree in electronics engineering from the National Taiwan University of Science and Technology, Taipei, Taiwan, in 2011, and the M.S. degree in electronics engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2014.

He is currently an SRAM Engineer with United Microelectronics Corporation, Hsinchu. His current research interests include SRAM design.



**Chi-Chang Shuai** received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1993, and the M.S. degree in electronics engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1995.

He is currently the SRAM Department Manager with United Microelectronics Corporation, Hsinchu. His current research interests include SRAM and embedded DRAM design.



**Yen-Yao Wang** was born in 1991. He received the M.S. degree in electronics engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2015.

He is currently an SRAM Circuit Design Engineer with United Microelectronics Corporation. His current research interests include low-power and high-speed SRAM design.



**Hiroyuki Yamauchi** received the Ph.D. degree in engineering from Kyushu University, Fukuoka, Japan, in 1997.

In 1985, he joined the Semiconductor Research Center, Panasonic, Osaka, Japan. From 1985 to 1987, he was involved in the scaled sense amplifier for ultrahigh-density DRAMs, which was presented at the 1989 Symposium on VLSI Circuits. From 1988 to 1994, he was involved in the research and development of 16-Mb CMOS DRAMs, including the battery-operated high-speed 16-Mb CMOS

DRAM and the ultralow-power, three times longer, self-refresh DRAM, which were presented at the 1993 and 1995 International Solid-State Circuits Conference (ISSCC), respectively. He also presented the charge-recycling bus architecture and low-voltage operated high-speed VLSIs, including 0.5-V/100-MHz-operated SRAM and gate-over-driving CMOS architecture, which were presented at the Symposium on VLSI Circuits in 1994 and 1996, respectively, and at the 1997 ISSCC as well. After being a General Manager for the development of various embedded memories, eSRAM, eDRAM, eFlash, eFeRAM, and eReRAM for system LSI, Panasonic, he moved to the Fukuoka Institute of Technology, Fukuoka, and has been a Professor since 2005. He holds 212 patents, including 87 U.S. patents, and has presented over 70 journal papers and proceedings of international conferences, including ten for ISSCC and 11 for Symposium on VLSI Circuits. His current research interests include machine-learning-based fault-tolerant memory circuit designs for Internet of Things sensor applications.

Dr. Yamauchi received the 1996 Remarkable Invention Award from the Science and Technology Agency of Japanese Government, and the highest ISOCC2008 Best Paper Award and the ISOCC2013 IEEK Best Paper Award.