

# A ReRAM-Based Nonvolatile Flip-Flop With Self-Write-Termination Scheme for Frequent-OFF Fast-Wake-Up Nonvolatile Processors

Albert Lee, Chieh-Pu Lo, Chien-Chen Lin, Wei-Hao Chen, Kuo-Hsiang Hsu, Zhibo Wang, Fang Su, Zhe Yuan, Qi Wei, Ya-Chin King, Chrong-Jung Lin, *Senior Member, IEEE*, Hochul Lee, *Student Member, IEEE*, Pedram Khalili Amiri, *Member, IEEE*, Kang-Lung Wang, *Fellow, IEEE*, Yu Wang, *Senior Member, IEEE*, Huazhong Yang, *Senior Member, IEEE*, Yongpan Liu, *Senior Member, IEEE*, and Meng-Fan Chang, *Senior Member, IEEE*

**Abstract**—Nonvolatile flip-flops (nvFFs) enable frequent-off processors to achieve fast power-off and wake-up time while maintaining critical local computing states through parallel data movement between volatile FFs and local nonvolatile memory (NVM) devices. However, current nvFFs face challenges in large store energy ( $E_S$ ) and long voltage stress time on the device (TSTRESS), due to wide distribution in the write time of NVM device as well as unnecessary writes. Moreover, heavy parasitic load on the power rail cause long wake-up time for restoring data from NVM to FFs. This paper proposes the resistive RAM (ReRAM)-based nvFF with self-write termination (SWT) and reduced loading on power rail to: 1) reduce 93+% waste of  $E_S$  from fast switching or matched cells; 2) suppress endurance and reliability degradation resulted from overprogramming and long TSTRESS; and 3) achieve reliable and 26+ times faster restore operation compared with previous nvFFs. We have fabricated a nonvolatile processor and a test chip with SWT-nvFFs using logic-process ReRAM in a 65-nm CMOS process. Measured results show sub-2-ns termination response time and sub-20-ns chip-level restore time.

**Index Terms**—Flip-flop (FF), nonvolatile logic, nonvolatile processor (nv-Processor), resistive RAM (ReRAM).

## I. INTRODUCTION

DESPITE the ever-increasing demand for energy efficiency, standby power continues to increase as dimensions scale. Long operational lifetime and rapid, reliable recovery are essential in ultra-low energy applications, such as smart mobile devices, wearable devices, wireless sensor networks, and the Internet of Things. Employing nonvolatile memory (NVM) to retain system states and using power-OFF is a common approach for extending the lifetime of a device through eliminating standby power. During this period, data

Manuscript received December 21, 2016; revised March 19, 2017; accepted April 23, 2017. Date of publication June 28, 2017; date of current version July 20, 2017. This paper was approved by Associate Editor Hideto Hidaka. (Corresponding author: Meng-Fan Chang.)

A. Lee, H. Lee, P. Khalili Amiri, and K.-L. Wang are with the University of California at Los Angeles, Los Angeles, CA 90095 USA.

C.-P. Lo, C.-C. Lin, W.-H. Chen, K.-H. Hsu, Y.-C. King, C.-J. Lin, and M.-F. Chang are with National Tsing Hua University, Hsinchu 30013, Taiwan (e-mail: mfchang@ee.nthu.edu.tw).

Z. Wang, F. Su, Z. Yuan, Q. Wei, Y. Wang, H. Yang, and Y. Liu are with Tsinghua University, Beijing 100084, China.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2700788



Fig. 1. Conceptual view of conventional two-macro solution versus non-volatile logics.

in on-chip SRAM and flip-flops (FFs) are moved to the NVM macros to ensure system stability after power interruption [1]–[5]. However, this MCU + NVM two-macro approach results in great overhead in delay, energy, as well as system complexity during the store (power-OFF) and restore (wake up) operations, especially from registers distributed across the processors that do not allow random access. These overheads greatly degrade the standby-power reduction of the two-macro approach and thus result in long break-even time (BET).

Recent advances in CMOS-compatible NVM devices have led to the development of nonvolatile logics [2], [6]–[16], which merges NVM into CMOS circuitry in FFs, SRAMs, or TCAMs to enable parallel data movement between NVM devices and volatile storage elements for fast, local, and low-energy store/restore operations. Fig. 1 shows the comparison of an energy-efficient system using the two-macro approach and a system employing nonvolatile logic.

The application of each different emerging NVM has trade-offs. Ferroelectric RAM (FeRAM) [6], [7], [17], [18] has the advantage of low write energy, but suffers from small restore margins and large device area. Magnetic RAM (MRAM) [8], [19], [20] has high endurance, fast write time, but suffers from small tunneling magnetoresistance ratios (TMRs) and extremely low resistances. Phase change memory [21]–[23] has a large resistive ratio ( $R$ -ratio), yet high programming current density and heating issues limit its usage inside processors. Resistive RAM (ReRAM) [24]–[27], [37] has a

low manufacturing cost, acceptable write power, and high resistance ( $R$ )-ratio ( $R$ -ratio =  $R_{\text{HRS}}/R_{\text{LRS}}$ ) between high- $R$  (HRS,  $R_{\text{HRS}}$ ) and low- $R$  (LRS,  $R_{\text{LRS}}$ ) states. However, its physical mechanisms results in slightly worse endurance and variation.

Nonvolatile FFs (nvFFs) [6]–[8] have been proposed to store critical local computing states with faster power-OFF and wake-up time than conventional two-macro approach for frequent-OFF processors. Previous nvFFs employ the worst case programming conditions (Write Time— $T_{\text{STORE\_MAX}}$ , Voltage— $V_{\text{STORE\_MAX}}$ , and Current— $I_{\text{STORE\_MAX}}$ ) to ensure switching of the tail NVM devices. Unfortunately, many NVM devices have wide distribution in write time. Moreover, small signal ratio ( $C$ -ratio, TMR, and  $R$ -ratio) requires assist scheme for restore operations. Thus, previous nvFFs have faced the following challenges:

- 1) excessive waste of store energy in fast-switching, over-programmed, and matched cells, where a “matched cell” refers to a cell with NVM devices already in the same state as the to-be-written data;
- 2) degraded resistance ratio, endurance, and reliability as well as low  $R_{\text{LRS}}$  ( $R_{\text{LRS\_MIN}}$ )-induced write failure due to long  $T_{\text{STRESS}}$  and overwrite of fast-switch and matched cells;
- 3) long wake-up time and large wake-up power due to excessive capacitive loading [6], [7] or dc-current [8] on the power rail in order to achieve high restore yield.

Details of the above-mentioned challenges will be discussed in Section II.

The remainder of this paper is organized as follows. Section II provides background information related to nvFF. Section III describes the proposed self-write-termination (SWT)-nvFF circuit and operations. Section IV analyzes the energy performance of the SWT-nvFF cell and addresses concerns regarding reliability and overhead. The full-system implementation, measured results, and comparison to previous works are also given in this section. Finally, the conclusions are drawn in Section V.

## II. BACKGROUND

### A. Operation of Employed ReRAM Device

To prevent excessive waste of energy, nonvolatile processors (nv-Processors) only execute store and power-OFF during long standby periods, whereas dynamic voltage scaling and multiple-power domains [4], [5] are more efficient for short standby periods. Most normally OFF devices are activated only a few times every few hours or one time per day. On the other hand, the restore operation must be fast and reliable for event-driven computing. The endurance requirements of the NVM device are not particularly strict; however, a sufficient ON–OFF ratio is required to overcome offset in the CMOS latches caused by process variation. ReRAM devices are promising candidates for nv-Processors, due to their high  $R$ -ratio, relatively low write current, sufficient endurance, and low manufacturing cost.

To achieve high density and reduce the cost of manufacturing, this paper employed a TiN-TiON-based contact ReRAM sandwiched between the first metal and n+ diffusion



Fig. 2. Characteristics of employed RRAM device. (a) Operation conditions. (b) Device structure. (c) Resistance distribution. (d) Programming time distribution.

region inside a contact hole [24]. Contact ReRAM eliminates the need for an additional connection from the top metal to the lower metal circuitry, as required for conventional ReRAM placed between high metal layers. Fig. 2 presents the characteristics of the unipolar ReRAM device employed in this paper. The endurance of the employed ReRAM device can exceed  $10^6$  cycles without a significant shift in resistance, and the minimum  $R$ -ratio exceeds 4. The ReRAM device stores data in two resistive states: low resistive state (LRS) and high resistive state (HRS). The SET operation switches the ReRAM resistance to  $R_{\text{LRS}}$  by applying voltage of  $V_{\text{SET}}$  for the duration of  $T_{\text{SET}}$ . The RESET operation switches the device back to  $R_{\text{HRS}}$  by applying voltage of  $V_{\text{RESET}}$  for the duration of  $T_{\text{RESET}}$ . As with most NVM devices, the  $T_{\text{SET}}/T_{\text{RESET}}$  and  $R_{\text{HRS}}/R_{\text{LRS}}$  of the ReRAM vary considerably [28]–[30] due to process variations.

### B. Challenges in Application of Nonvolatile Flip-Flop

1) *Energy and Reliability Concerns*: In NVM macros, error-correction code (ECC) blocks are used to deal with write failure caused by insufficient  $T_{\text{STORE}}$  in slow-switch cells. However, due to the data path and the area of FFs, nvFF cannot afford the overhead of using ECC and must, therefore, employ the worst case  $T_{\text{STORE}}$  ( $T_{\text{SET\_MAX}}$  and  $T_{\text{RESET\_MAX}}$ ) to ensure successful write of all NVM cells.

For fast-switching cells that switch before  $T_{\text{STORE\_MAX}}$  (shorter  $T_{\text{STORE}}$ ) or match cells that do not need switching, the SET/RESET biases are continuously applied to the cell despite the cell already being in the LRS/HRS states.



Fig. 3. Voltage and current waveforms across the RRAM device during store operations.

Applying the longest  $T_{STORE}$  to these cells wastes power and the long  $T_{STRESS}$  results in over-SET/over-RESET. The over-SET [31], [32] behavior makes HRS cells switch to LRS with ultra-low  $R_{LRS}$ , resulting in subsequent RESET failure due to insufficient bias across the ReRAM device under given RESET conditions. The over-RESET [33], [34] behavior makes an HRS cell suffer degraded reliability and subsequent SET failure. These issues, as shown in Fig. 3, have become increasingly severe as variations increase with device dimension scaling.

By terminating the NVM-write biases when an NVM cell reaches its target  $R_{HRS}/R_{LRS}$ , write termination (WT) provides an effective way to overcome challenge (1) in a memory macro [25], [26], [35]. However, the large area overhead for the conventional operational amplifier-based WT [26] scheme for memory application prevents it from being applied to nvFFs. Thus, there is a need of efficient self-WT (SWT) scheme with reduced area overhead for nvFFs.

**2) Restore Performance Concerns:** An insufficient difference (or ratio) between the ON- and OFF-states of the NVM device may lead to restore failure when this margin fails to overcome the mismatch of the latch pair due to process variations in transistors.

To achieve higher restore yield, FeRAM-capacitor-based nvFFs need to increase restore margins through assist methods.

Charge-dump-nvFF [6] dump charges from large-capacitance FeRAM cells ( $\sim 380$  fJ/cell) into small-capacitance storage nodes ( $\sim 10$  fJ) of a CMOS latch, which increases the voltage difference of  $Q$  and  $QB$ . Differential-cap-nvFF [7] uses two FeRAMs on each storage node (i.e., four FeRAM/Cell) to increasing the restore margin by  $2\times$  through capacitance division. In equalization-nvFF (EQ\_nvFF) [8], the slave latch is equalized during restore to decrease the effect of PMOS mismatch. Thus, previous nvFFs suffered from long power-rail ( $V_{DD\_FF}$ ) ramp-up times ( $T_{VDDFF\_RMP}$ ), due to a large capacitive or current load (dc-short caused by the latch-equalization) on  $V_{DD\_FF}$ . Table I shows an overview of previous silicon verified nvFFs.

### III. PROPOSED SELF-WRITE-TERMINATION NONVOLAILTE FLIP-FLOP

This paper presents an SWT-nvFF using the intrinsic positive feedback of the slave latch to achieve fast and reliable termination at a compact area. To shorten the system wake-up time and restore energy when the number of nvFF in a chip is large, the NVM devices are decoupled from the power rail.

#### A. Structure of the SWT-nvFF

Fig. 4 presents the schematic and operation of the proposed SWT-nvFF, comprising a master latch (M-Latch), a dual-mode slave latch ( $M0-M1$  and  $M2-M3$ ), and an NVM-control unit (NVMCU). The NVMCU comprises two NVM devices (RL and RR), two isolation transistors ( $M6$  and  $M7$ ), two dual-mode switches ( $SW1$  and  $SW2$ ), store/restore polarity selectors ( $M10$  and  $M11$ ), a self-termination circuit for RESET ( $M8-M9$  and  $M12-M14$ ), and a self-termination bias circuit for SET ( $M4-M5$ ). The SWT-nvFF has three modes: FF, store (NVM-write), and restore (wake up). In the FF mode, the dual-mode switches  $SW1$  and  $SW2$  are ON to connect  $M0-M3$  to form a cross-coupled latch, as in a typical FF. Under  $RESET = 0$ , a lack of voltage stress on RL/RR enhances the reliability of NVM devices. Store operations (including RESET and SET phases) move data from the slave latch into the NVMs before power-OFF, and restore operation loads the data from the NVMs back into the slave latch at power-ON.

#### B. Store Operation—RESET

Fig. 5 presents the Store-RESET operation. At the start of the RESET operation ( $STOREB = 0$  and  $RESET = 1$ ),  $RST\_FB$  is high to form a resistive division path between  $M10$ ,  $M14$ , and RL- $M8$ /RR- $M9$ . The selection of RL- $M8$  or RR- $M9$  path is determined by the data at  $Q/QB$ .  $M12-M13$  then detects the voltage at node  $NX$  ( $V_{NX}$ ) and generates feedback bias ( $RST\_FB$ ) for the write driver ( $M14$ ). RR is selected when  $Q = 0$ . If RR is LRS (mismatch), then the drop in  $V_{NX}$  turns ON  $M12$ , such that  $RST\_FB = 1$  and  $M14$  remains ON to provide RESET current ( $I_{RESET}$ ). When the RR switches from an LRS to HRS, an increase in  $V_{NX}$  lowers  $RST\_FB$ , which further raises  $V_{NX}$ . Positive feedback

TABLE I  
PREVIOUS WORKS ON nvFFs EMBEDDED IN nv-PROCESSORS

| Name                           | Charge-Dump nvFF [6]                | Differential-cap nvFF [7] | Equalization-nvFF [8]                 | This work               |
|--------------------------------|-------------------------------------|---------------------------|---------------------------------------|-------------------------|
| Schematic                      |                                     |                           |                                       |                         |
| NVM Device                     | FeRAM                               | FeRAM                     | MTJ                                   | ReRAM                   |
| CMOS Technology                | 130nm                               | 130nm                     | 90nm                                  | 65nm                    |
| Additional Devices             | 22T+2C                              | 18T+4C                    | 18T+2MTJ                              | 15T+2R                  |
| Write Termination              | No                                  | No                        | No                                    | <b>Yes</b>              |
| Restore Method                 | Charge-dump to storage node         | Capacitance division      | Latch-equalization + R-discharge-CVSS | R-pulldown storage node |
| Bottlenecks for Restore margin | Charge amount and charge difference | Capacitance Ratio         | R-ratio (TMR-ratio)                   | R-ratio                 |



Fig. 4. (a) Full schematic of the SWT-nvFF, (b) operation control table, and (c) functional waveforms for a full power-OFF-ON cycle.

between  $M12$ ,  $M13$ , and  $M14$  leads to  $RST\_FB = 0$  and the switching OFF of  $M14$  and  $I_{RESET}$ . When the selected  $RL/RR$  is already in HRS (match), a high  $V_{NX}$  ensures that  $M12$  is

OFF and  $RST\_FB = 0$ , thereby turning OFF the  $M14/I_{RESET}$  and ensuring that the RESET operation does not occur. Note that for the RESET operation, the feedback circuit ( $M12-M14$ )



Fig. 5. (a) Active transistors and (b) operation waveform for the Store-RESET operation.



Fig. 6. (a) Active transistors and (b) operation waveform for the Store-SET operation.

uses a voltage input, since the RESET operation is a current-based operation.

### C. Store Operation—SET

Fig. 6 presents the Store-SET operation. When the RESET operation ends,  $\text{SET} = \text{RSWL} = 1$ , which turns OFF the

$\text{SW1/SW2}$  and turns ON  $M4/M5$  to produce two SET-feedback (SFB1) circuits ( $M0-M3-M4$  and  $M1-M2-M5$ ). When  $Q = 0$ , SFB1 ( $M0-M3-M4$ ) is selected and SFB2 is disabled ( $QB = 1$ ,  $Q1 = 0$ , and  $M1 = M2 = \text{OFF}$ ), while the  $M10-\text{RR}(\text{HRS})-M7$  path weakly charges  $QB$  to prevent  $QB$  from residing in a floating state. When the RL is in



Fig. 7. (a) Active transistors and (b) operation waveform for the restore operation.

HRS (mismatch),  $M_3$  keeps  $Q$  low to provide SET current  $I_{SET}$  ( $< I_{SET-MAX}$ ) and SET voltage  $V_{SET}$  across RL. When RL switches from an HRS to LRS, the low  $R_{LRS}$  generates a large dc-current  $I_{SET-dc}$  ( $> I_{SET-MAX}$ ), which causes  $Q$  to rise, thereby lowering the voltage at QB1 via the  $M_0$ - $M_4$  voltage-divider pair and further raising  $V_Q$ . When the resistance of RL is lower than the designed  $R_{LRS-TH}$ , positive feedback turns OFF  $M_3$ , such that the SET operation is terminated. When the RL is already in LRS (match), the large current  $I_{SET-dc}$  ( $> I_{SET-MAX}$ ) causes  $Q$  switching from 0 to 1, thereby lowering QB1 and quickly turning OFF  $M_3$  to end the SET operation. When  $Q = 1/QB = 0$ ,  $V_{NX}$  is high and the SET operation switches RR to LRS. In this case, the feedback circuit ( $M_0$ - $M_5$ ) uses a current input, since the SET operation is a voltage-based operation.

#### D. Restore of the SWT-nvFF

In restore mode, the application of  $RSWL = \text{Restore} = 1$  enables nodes  $Q$  and  $QB$  to have differential discharge currents flowing through RL ( $I_Q$ ) and RR ( $I_{QB}$ ), respectively. The LRS (HRS) discharge path provides a larger (smaller)  $I_Q/I_{QB}$ , resulting in a lower (higher) voltage at node  $Q/QB$  ( $V_Q/V_{QB}$ ), whereupon the slave latch pulls the higher (lower) voltage at  $Q/QB$  to logic-1 (logic-0). The active circuits during this restore period are shown in Fig. 7.

In contrast to FeRAMs and MRAMs, the suitable resistance ranges and sufficient  $R$ -ratio of our ReRAMs eliminates the need of additional assist schemes to increase restore margin



Fig. 8. Store energy of cells with different  $T_{STORE}/T_{STORE-MAX}$ .

and achieve high restore yield. This avoids the excess capacitance/dc-current loading on the power rail, which allows our ReRAM-based nvFFs to achieve fast wake up and low restore energy.

## IV. ANALYSIS AND MEASUREMENT RESULTS

In this section, we perform analysis on the energy, speed, yield, and area of the SWT-nvFF. We then present the implementation in an nv-Processor and the measured results of a test chip.



Fig. 9. Store energy for different match percentages.



Fig. 10. Energy comparison of standby and power-OFF-ON operations.

### A. Performance

Fig. 8 presents the store energy ( $E_S$ ) when writing cells with different  $T_{STORE}$ . Without the SWT scheme, the write biases are supplied throughout  $T_{STORE-MAX}$ , and nearly 80% of  $E_S$  is wasted as dc current even for an average slow-switching cell ( $T_{STORE-MAX}/T_{STORE} = 10$ ). As the cell switches faster, an increased amount of energy is wasted during the store cycle.

With a small bias current, the SWT scheme efficiently cuts off the store conditions within picoseconds after switching, eliminating the dc current, and achieving a 76% reduction in  $E_S$  for slow-switching cells. For cells with typical switching time ( $T_{STORE-MAX}/T_{STORE} = 40$ ) and fast-switching cells ( $T_{STORE-MAX}/T_{STORE} = 100$ ), a 93% and 97% reduction in  $E_S$  can be achieved, respectively.

Fig. 9 presents the store energy consumption under different percentages of match cells. When 0% of the data is match, all cells need to be written and the SWT-nvFF saves 86% energy, mainly due to termination of fast-switching cells. On the other hand, for a 100% match, all cells keep its previous value and no write operation is required. Under this condition, the SWT saves more than 99.9% energy by terminating match cells. It is worth noting that embedded benchmarks have an average of above 78% match [36].

Fig. 10 presents a comparison of total energy consumption in standby and power-OFF-ON operations. During



Fig. 11. Restore speed of various nvFFs for a different amount of cells on the power rail.

Fig. 12. Restore yield for different structures,  $R_{LRS}$ , and  $R$ -Ratio.

standby, the FFs consume constant leakage current, which increases the total energy consumption over time. Current leakage can largely be eliminated using power-OFF-ON operations, albeit at the cost of additional energy to store data in NVM during power-OFF operations and restore data back to the FF during power-ON operations. The BET is defined as the standby time in which the total leakage energy is equal to the store and restore energy. In this paper, the BET was 700  $\mu$ s. For standby periods exceeding BET, the use of power-OFF-ON operations is more energy efficient.

Fig. 11 shows the restore delay of this paper compared with the previous nvFFs. Due to the need to charge excess capacitive loading during power-up, FeRAM-based nvFFs require a 27+ times larger  $T_{VDD_{RMP}}$  compared with resistive-type nvFFs with 100 nvFFs on the power rail. EQ\_nvFF has fast rise time, but fails at large FF-count (i.e., 2500 or 4000) as a result of large dc current drawn from the supply, which causes  $V_{DD\_FF}$  unable to reach  $V_{DD}$ . As a result of our higher  $R$ -ratio, a decoupling of NVM devices from  $V_{DD\_FF}$ , and the elimination of the latch equalization, the SWT-nvFF achieves 27+ times faster  $T_{VDD_{RMP}}$  for nvFF count sufficient to ensure system stability (i.e., 2500 nvFF).



Fig. 13. (a) Layout of a single nvFF cell, (b) placement approaches of the nvFFs in the MCU, and (c) area overhead for different application rates of nvFF.

Along with the  $R$ -ratio, the absolute resistance of the NVM devices also plays a critical role in the restore performance. When both  $R_{HRS}$  and  $R_{LRS}$  are low ( $<5k$ ), both states are equivalently “fully turned-ON transistors,” which makes it difficult for a transistor-comprised latch to distinguish the two states. Likewise, if both  $R_{HRS}$  and  $R_{LRS}$  are high, the latch sees two “OFF transistors,” which also leads to reduced yield. Therefore, nonvolatile logics have the best performance when the employed NVM devices have resistances comparable to that of the latch, in which the latch will see  $R_{HRS}$  as an “OFF transistor” and  $R_{LRS}$  as an “ON transistor.” Fig. 12 presents the restore yield obtained from a 10 000 point Monte Carlo simulation. Current state-of-the art MRAM has an

$R$ -ratio of 2–3, and still require restore assist techniques to overcome variation. Variations in the resistance of our ReRAM device are based on measurement results obtained from previous work using the same devices [37]. As a result of larger  $R$ -ratio ( $>5$ ) and suitable device resistance, the RRAM-based nvFF achieves a 22+% improvement in restore-yield compared with MRAM-based nvFFs.

Fig. 13(a) shows the layout of a single nvFF. IO devices are included as a part of the transistors to ensure device reliability, to account for the fact that current ReRAM devices require more than 2 V for store operations. We made the height of the nvFF the same as that of the conventional FF in order to fit the track height of standard cells for auto place and route



Fig. 14. Structure, die photo, and performance of the nv-Processor and the test chip.



Fig. 15. Measured power-OFF-ON cycle waveform of the SWT-nvFF.

operations. The layout of the master latch is the same as that of conventional FF. The additional switches and nv-control circuit make the slave stage of our nvFF wider than that of conventional FF, resulting in area overhead 39% greater than that of conventional FF. Dummy ReRAM devices are placed around regular ReRAM devices to increase the yield against proximity effects.

For the sake of uniformity in the placement of nvFFs, the devices were clustered into groups. This also reduced the proximity effect of the ReRAM devices, decreased the area overhead of the high-voltage transistors, and enabled a more relaxed routing path for the high-voltage supply and

nonvolatile control signals. The various blocks differed with regard to the grouping of nvFFs clusters. Fig. 13(b) presents four possible approaches to the placement of each nvFF group in a sub-block. In case-A, nvFFs are placed along top or bottom tracks. These tracks are not the same as those used for input/output logic cells of nvFFs. In case-B, a central group of nvFFs is added to blocks with high volumes of tracks for standard cells. In case-C, each nvFF group is placed at the left or right edge of the tracks used for the input/output logic cells of nvFFs. In case-D, for blocks with a long track, additional groups are added at the center of each track.



Fig. 16. Measured self-termination waveforms for (a) match cell and (b) slow-switching cell.



Fig. 17. Measured restore performance SWT-nvFF.

Fig. 13(c) presents the area overhead for different usage rates of nvFF (number of nvFFs/total FFs) in an nv-Processor. To avoid excessive overhead in area, only FFs storing critical data require the nonvolatile function. The overall area overhead of applying the SWT-nvFF to 10% and 30% of the total FFs in the MCU is 1.68% and 4.50%, respectively.

#### B. Implementation and Measurement Results

An nv-Processor and a test chip were fabricated using 65-nm CMOS process and logic-compatible ReRAM. Fig. 14 presents the die photo, structure, and performance of the 8-bit nv-Processor, which consists of an adaptive nonvolatile controller, 8-kB code ReRAM, 1422 adaptive nvFFs, and a configurable 4-kB nvSRAM. The test chip includes three nvFF arrays, each 1 kb in size.

Captured waveforms confirm the functionality of the FF, store, power-OFF, and restore modes of the nv-Processor. All measured delays include the path delay due to I/O drivers, pad



Fig. 18. Measured Shmoo plot of the SWT-nvFF.

delays, as well as test-board parasitic load. Fig. 15 presents the power-ON-OFF cycle for the two possible data conditions. Note that when SET termination occurs, both  $Q$  and  $QB$  rise to 1, which can be observed in a switching of  $Q$  when  $Q = 0$ . Fig. 16 presents the detailed waveforms of the SET/RESET termination for matched cells and slow-switching cells. We can observe that a matched nvFF clearly undergoes early switching on RST\_FB (falling edge) with sub-2-ns response time, while slow-switching cells show late switching in RST\_FB. The measured  $E_S$  of a matched-nvFF is 99% smaller than that of a slow-switch nvFF. In Fig. 17, the measured  $T_{RSTR}$  and  $T_{VDDFF\_RMP}$  for a 1-K nvFF array are shown. The measured restore time is less than 20 ns with additional  $\sim 100\text{-pF}$  test-board parasitic load. Fig. 18 presents the measured Shmoo plot. All measured delays include the test-path-delay due to I/O drivers, pad delays of test chip, and test-board parasitic load. The nvFF access time (including test-path-delay) was 1.26 ns at  $V_{DD} = 1.1\text{ V}$  and 12.97 ns at  $V_{DD} = 0.4\text{ V}$ . In simulations, the nvFF presented a 20-ps penalty in CLK- $Q$  delay; however, this was difficult to measure the access time overhead due to the minimum resolution of the measurement system and test-mode implementation.



Fig. 19. Measured performance distribution across different dies. (a) Store energy. (b) Access time at  $V_{DD} = 1$  V and  $V_{DD} = 0.5$  V. (c) Minimum operating voltage.

Fig. 19 presents the performance results averaged across five dies. The store energy (SET + RESET) averaged 46.2 pJ/bit with distribution ranging from 39.3 to 53.2 pJ/bit. The worst case access times in FF mode at nominal  $V_{DD} = 1$  V and low  $V_{DD} = 0.5$  V were 1.97 and 8 ns, respectively. The minimum operating voltage ( $V_{min}$ ) ranged from 0.24 to 0.3 V.

## V. CONCLUSION

Non-volatile logic combining emerging NVMs with CMOS is a promising candidate to enable processors with high performance and ultra-low power. In this paper, an SWT-nvFF using logic-compatible ReRAM process has been proposed to address the challenges of previous nvFFs in terms of area, store energy, restore time/energy, as well as reliability. The SWT-nvFF achieves a 93+% reduction in store energy ( $E_S$ ) and 27+ times in restore time ( $T_{RSTR}$ ) compared with the previous nvFFs. The fabricated nv-Processor and test chips confirmed the nonvolatile functions, energy savings, and sub-20-ns restore times.

## ACKNOWLEDGMENT

The authors would like to thank Ministry of Science and Technology, Taiwan, the Chip Implementation Center, Electronics, joint-project of National Tsing Hua University and Tsinghua University, and Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan, for the support in testing, finance, and manufacturing.

## REFERENCES

- [1] S.-H. Song, K. C. Chun, and C. H. Kim, "A logic-compatible embedded Flash memory for zero-standby power system-on-chips featuring a multi-story high voltage switch and a selective refresh scheme," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1302–1314, May 2013.
- [2] A. S. Iyengar, S. Ghosh, and J. W. Jang, "MTJ-based state retentive flip-flop with enhanced-scan capability to sustain sudden power failure," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 8, pp. 2062–2068, Aug. 2015.
- [3] J. Yater *et al.*, "First-ever high-performance, low-power 32-bit microcontrollers with embedded nanocrystal flash and enhanced EEPROM memories," in *Proc. IEEE Int. Conf. IC Design Technol.*, Sep. 2012, pp. 1–3.
- [4] Y. Shuto, S. Yamamoto, and S. Sugahara, "Comparative study of power-gating architectures for nonvolatile FinFET-SRAM using spintronics-based retention technology," in *Proc. Design Autom. Test Eur. Conf. Exhibit. (DATE)*, 2015, pp. 866–871.
- [5] Y. Shuto, S. Yamamoto, and S. Sugahara, "New power-gating architectures using nonvolatile retention: Comparative study of nonvolatile power-gating (NVPG) and normally-off architectures for SRAM," in *Proc. Int. Conf. Microelectron. Test Struct. (ICMITS)*, 2016, pp. 136–141.
- [6] M. Qazi *et al.*, "A 3.4pJ FeRAM-enabled D flip-flop in 0.13  $\mu$ m CMOS for nonvolatile processing in digital systems," in *Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2013, pp. 192–193.
- [7] S. C. Bartling *et al.*, "An 8MHz 75  $\mu$ A/MHz zero-leakage non-volatile logic-based Cortex-M0 MCU SoC exhibiting 100% digital state retention at  $VDD=0$ V with <400ns wakeup and sleep transitions," in *Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2013, pp. 432–433.
- [8] N. Sakimura *et al.*, "A 90 nm 20 MHz fully nonvolatile microcontroller for standby-power-critical applications," in *Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2014, pp. 184–185.
- [9] P.-F. Chiu *et al.*, "A low store energy, low  $VDD_{min}$ , nonvolatile 8T2R SRAM with 3D stacked RRAM devices for low power mobile applications," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2010, pp. 229–230.
- [10] C. C. Lin *et al.*, "A 256 b-wordlength ReRAM-based TCAM with 1ns search-time and 14x improvement in wordlength-energy efficiency-density product using 2.5T1R cell," in *Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2016, pp. 136–137.
- [11] D. Suzuki *et al.*, "Fabrication of a nonvolatile lookup-table circuit chip using magneto/semiconductor-hybrid structure for an immediate-power-up field programmable gate array," in *Proc. Symp. VLSI Circuits (VLSIC)*, Jun. 2009, pp. 80–81.
- [12] S. Matsunaga *et al.*, "A 3.14  $\mu$ m 2 4T-2MTJ-cell fully parallel TCAM based on nonvolatile logic-in-memory architecture," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2012, pp. 44–45.
- [13] L.-Y. Huang *et al.*, "ReRAM-based 4T2R nonvolatile TCAM with 7x NVMStress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2014, pp. 99–100.
- [14] S. Yamamoto, Y. Shuto, and S. Sugahara, "Nonvolatile SRAM (NVS RAM) using functional MOSFET merged with resistive switching devices," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2009, pp. 531–534.
- [15] A. Lee *et al.*, "RRAM-based 7T1R nonvolatile SRAM with 2x reduction in store energy and 94x reduction in restore energy for frequent-off instant-on applications," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Aug. 2015, pp. C76–C77.
- [16] W. Wang *et al.*, "Nonvolatile SRAM cell," in *IEDM Tech. Dig. Papers*, Sep. 2006, pp. 1–4.
- [17] J. H. Park *et al.*, "Fully logic compatible (1.6V Vcc, 2 additional FRAM masks) highly reliable sub 10F2 embedded FRAM with advanced direct via technology and robust 100 nm thick MOCVD PZT technology," in *IEDM Tech. Dig.*, Dec. 2004, pp. 591–594.
- [18] D. Takashima *et al.*, "A 76-mm<sup>2</sup> 8-Mb chain ferroelectric memory," *IEEE J. Solid-State Circuits*, vol. 36, no. 11, pp. 1713–1720, Nov. 2001.
- [19] K. Ikegami *et al.*, "MTJ-based ‘normally-off processors’ with thermal stability factor engineered perpendicular MTJ, L2 cache based on 2T-2MTJ cell, L3 and last level cache based on 1T-1MTJ cell and novel error handling scheme," in *IEDM Dig. Tech. Papers*, Dec. 2015, pp. 25.1.1–25.1.4.
- [20] H. Noguchi *et al.*, "4Mb STT-MRAM-based cache with memory-access-aware power optimization and write-verify-write/read-modify-write scheme," in *Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2016, pp. 132–133.

- [21] G. F. Close *et al.*, “A 256-Mcell phase-change memory chip operating at 2+ bit/cell,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 6, pp. 1521–1533, Jun. 2013.
- [22] J. Hu, M. Xie, C. Pan, C. J. Xue, Q. Zhuge, and E. H. M. Sha, “Low overhead software wear leveling for hybrid PCM + DRAM main memory on embedded systems,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 4, pp. 654–663, Apr. 2015.
- [23] K. Huang, Y. Ha, R. Zhao, A. Kumar, and Y. Lian, “A low active leakage and high reliability phase change memory (PCM) based non-volatile FPGA storage element,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 9, pp. 2605–2613, Sep. 2014.
- [24] Y. H. Tseng, C.-E. Huang, C. H. Kuo, Y. D. Chih, and C. J. Lin, “High density and ultra small cell size of contact ReRAM (CR-RAM) in 90 nm CMOS logic technology and circuits,” in *IEDM Tech. Dig. Papers*, Dec. 2009, pp. 1–4.
- [25] C. J. Chevallier *et al.*, “A 0.13  $\mu\text{m}$  64Mb multi-layered conductive metal-oxide memory,” in *Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2010, pp. 260–261.
- [26] M.-F. Chang *et al.*, “Embedded 1 Mb ReRAM in 28 nm CMOS with 0.27-to-1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme,” in *Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 332–333.
- [27] X. Y. Xue *et al.*, “A 0.13  $\mu\text{m}$  8Mb logic based Cu<sub>x</sub>Si<sub>y</sub>O resistive memory with self-adaptive yield enhancement and operation power reduction,” in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2012, pp. 42–43.
- [28] J. W. Ryu and K. W. Kwon, “A reliable 2T2MTJ nonvolatile static gain cell STT-MRAM with self-referencing sensing circuits for embedded memory application,” *IEEE Trans. Magn.*, vol. 52, no. 4, pp. 1–10, Apr. 2016.
- [29] S. Clima *et al.*, “Intrinsic tailing of resistive states distributions in amorphous HfO<sub>x</sub> and TaO<sub>x</sub> based resistive random access memories,” *IEEE Electron Device Lett.*, vol. 36, no. 8, pp. 769–771, Aug. 2015.
- [30] W. S. Khwa *et al.*, “A procedure to reduce cell variation in phase change memory for improving multi-level-cell performances,” in *Proc. IEEE Int. Memory Workshop (IMW)*, 2015, pp. 1–4.
- [31] W. C. Luo *et al.*, “Statistical model and rapid prediction of RRAM SET speed-disturb dilemma,” *IEEE Trans. Electron Devices*, vol. 60, no. 11, pp. 3760–3766, Nov. 2013.
- [32] Y. Hosoi *et al.*, “High speed unipolar switching resistance RAM (RRAM) technology,” in *IEDM Dig. Tech. Papers*, Apr. 2006, pp. 1–4.
- [33] J. Song *et al.*, “Effects of RESET current overshoot and resistance state on reliability of RRAM,” *IEEE Electron Device Lett.*, vol. 35, no. 6, pp. 636–638, Jun. 2014.
- [34] H. Y. Lee *et al.*, “Evidence and solution of over-RESET problem for HfO<sub>x</sub> based resistive memory with sub-ns switching speed and high endurance,” in *IEDM Tech. Dig. Papers*, Dec. 2010, pp. 460–463.
- [35] D. Halupka *et al.*, “Negative-resistance read and write schemes for STT-MRAM in 0.13  $\mu\text{m}$  CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Apr. 2010, pp. 256–257.
- [36] Y. Liu *et al.*, “A 65nm ReRAM-enabled nonvolatile processor with 6x reduction in restore time and 4x higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic,” in *Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2016, pp. 84–86.
- [37] M.-F. Chang *et al.*, “A 0.5V 4 Mb logic-process compatible embedded resistive RAM (ReRAM) in 65 nm CMOS using low-voltage current-mode sensing scheme with 45 ns random read time,” *IEEE J. Solid-State Circuits*, vol. 48, no. 9, pp. 2250–2259, Sep. 2013.



**Albert Lee** received the bachelor’s and master’s degrees from National Tsing Hua University, Hsinchu, Taiwan, in 2013 and 2015, respectively. He is currently pursuing the Ph.D. degree with the University of California at Los Angeles, Los Angeles, CA, USA.

His current research interests include the application of emerging memory devices with logic, spin-based phenomenon, magnetic devices, memory circuits, silicon neurons, and neuroelectrodynamical systems.



**Chieh-Pu Lo** received the B.S. and M.S. degrees in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2014 and 2016, respectively.

He is currently an Engineer with Taiwan Semiconductor Manufacturing Company, Hsinchu. He is also with National Tsing Hua University. His current research interests include emerging memories circuit design and neuromorphic circuit design.



**Chien-Chen Lin** received the B.S. and M.S. degrees in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 2013 and 2015.

Since 2015, he was with Taiwan Semiconductor Manufacturing Company, Hsinchu. He is currently with National Tsing Hua University. His current research interests include the circuit design of emerging nonvolatile memory and embedded memory.



**Wei-Hao Chen** received the B.S. degree from the Department of Electrophysics, National Chiayi University, Chiayi, Taiwan, in 2011, and the M.S. degree from the Department of Physics, National Chung Hsing University, Taichung, Taiwan, in 2013. He is currently pursuing the Ph.D. degree with the Institute of Electronics Engineering, National Tsing Hua University, Hsinchu, Taiwan.

His current research interests include 3-D memory circuits, emerging memory-based neuromorphic circuits, and computing in memory.



**Kuo-Hsiang Hsu** received the B.S. degree in electrical engineering from Tamkang University, New Taipei City, Taiwan, in 2015. He is currently pursuing the M.S. degree in electrical engineering with National Tsing Hua University, Hsinchu, Taiwan.

His current research interests include the circuit design of emerging non-volatile memory and embedded memory.



**Zhibo Wang** was born in 1990. He received the B.S. degree from the Electronic Engineering Department, Tsinghua University, Beijing, China, in 2013, where he is currently pursuing the Ph.D. degree.

His current research interests include nonvolatile computing, nonvolatile circuits, and ultra-low-power VLSI design.



**Fang Su** received the B.S. degree from the Electronic Engineering Department, Tsinghua University, Beijing, China, in 2015, where he is currently pursuing the Ph.D. degree.

His current research interests include low-power system for AI-Internet-of-Things applications using emerging nonvolatile memory.



**Zhe Yuan** received the B.S. degree from the Department of Electronic Engineering, Tsinghua University, Beijing, China, in 2015, where he is currently pursuing the Ph.D. degree.

He is currently involved in architecture and circuit optimization for nonvolatile field-programmable gate arrays. His current research interests include low-power VLSI designs, nonvolatile memory, and electronic design automation.



**Qi Wei** received the Ph.D. degree from Tsinghua University, Beijing, China, in 2010.

He is currently an Assistant Professor with the Department of Electronics Engineering, Tsinghua University. His current research interests include analog IC design and high-performance data converters, including high-performance operational amplifier, pipeline analog-to-digital converter (ADC), successive approximation register ADC, and current DACs.



**Ya-Chin King** was born in Taiwan. She received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1992, and the M.S. degree in electrical engineering and the Ph.D. degree, with a focus on thin oxide technology and novel quasi-nonvolatile memory, from the University of California at Berkeley, Berkeley, CA, USA, in 1994 and 1999, respectively.

She joined the Faculty of National Tsing Hua University, Hsinchu, Taiwan, in 1999, where she is currently a Professor with the Electrical Engineering Department. Her current research topics include advance gate dielectric, CMOS image sensor, and non-volatile memory design.



**Chrong-Jung Lin** (SM'13) was born in Taipei, Taiwan, in 1969. He received the B.S., M.S., and Ph.D. degrees from National Tsing Hua University (NTHU), Hsinchu, Taiwan, in 1991, 1992, and 1996, respectively, and the Ph.D. degree in electrical engineering from National Tsing Hua University in 1996, with a focus on tunneling enhancement effect of silicon nano-crystal in Flash memory application.

Since 1996, he has been with the Research and Development Division, Taiwan Semiconductor Manufacturing Company, Hsinchu, where he has been granted 57 U.S. patents and 43 Taiwan patents in the research field. Since 2005, he has been teaching with the Department of Electrical Engineering, Institute of Electronics Engineering, NTHU. Since 2005, he has been leading the Microelectronics Laboratory, Advanced Flash Memory Center, NTHU. A lot of emerging and novel memory devices and technologies have been innovated and published by his research team.

Dr. Lin is currently a Program Committee Member of the IEEE International Electron Devices Meeting and the International Symposium on VLSI Technology, Systems and Applications.



**Hochul Lee** (S'13) received the B.S. degree in electrical engineering from Korea University, Seoul, South Korea, in 2005, and the M.S. degree from the Semiconductor Material Device Laboratory, Seoul National University, Seoul, South Korea. He is currently pursuing the Ph.D. degree with the Device Research Laboratory, University of California at Los Angeles, Los Angeles, CA, USA, with a focus on magnetic tunneling junctions-based hybrid CMOS circuit.

He was with the Flash Memory Circuit Design Team, Samsung Electronics, Hwasung, South Korea, until 2012.



**Pedram Khalili Amiri** (M'05) received the B.Sc. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2004, and the Ph.D. (*cum laude*) degree in electrical engineering from the Delft University of Technology, Delft, The Netherlands, in 2008.

He joined the Department of Electrical Engineering, University of California at Los Angeles, Los Angeles, CA, USA, in 2009, where he is currently an Assistant Adjunct Professor.

Dr. Khalili Amiri is currently serving on the Technical Program Committee of the Joint MMM/Intermag Conference. He is also serving as a Guest Editor of *Spin*.



**Kang-Lung Wang** (F'92) received the B.S. degree from National Cheng Kung University, Tainan, Taiwan, and the M.S. and Ph.D. degrees from the Massachusetts Institute of Technology, Cambridge, MA, USA.

He is currently a Distinguished Professor and a Raytheon Chair Professor in Physical Science and Electronics with the Electrical Engineering Department, University of California at Los Angeles, Los Angeles, CA, USA. His current research interests include nanoscale physics and materials, topological insulators, and spintronics and devices.

Dr. Wang is a member of the American Physical Society. He serves as an Editor for Artech House and other publications.



**Yu Wang** (SM'14) received the B.S. degree and the Ph.D. degree (Hons.) from Tsinghua University, Beijing, China, in 2002 and 2007, respectively.

He is currently an Associate Professor with the Department of Electronic Engineering, Tsinghua University. He has authored or co-authored over 130 papers in refereed journals and conferences. His current research interests include parallel circuit analysis, application specific hardware computing (especially on the brain related problems), and power/reliability aware system design methodology.

Dr. Wang serves as a TPC Member in many important conferences, such as DAC, field-programmable gate array, Design, Automation & Test in Europe, Asia and South Pacific Design Automation Conference (ASPDAC), International Symposium on Low Power Electronics and Design (ISLPED), International Symposium on Quality Electronic Design, International Conference on Field Programmable Technology (ICFPT), and IEEE Computer Society Annual Symposium on VLSI (ISVLSI). He was a recipient of the IBM X10 Faculty Award in 2010, the Best Paper Award in ISFPGA 2017 and ISVLSI 2012, and six Best Paper Nominations in ASPDAC/CODES/ISLPED. He is the TPC Co-Chair of the ICFPT 2011 and the Finance Chair of the ISLPED 2012–2015. He serves as an Associate Editor of the IEEE TRANSACTIONS ON CAD and the *Journal of Circuits, Systems, and Computers*.



**Huazhong Yang** (SM'00) was born in Ziyang, China, in 1967. He received the B.S. degree in microelectronics and the M.S. and Ph.D. degrees in electronic science and technology from Tsinghua University, Beijing, China, in 1989, 1993, and 1998, respectively.

He joined the Department of Electronic Engineering, Tsinghua University in 1993, where he has been a Full Professor since 1998. He is currently a specially appointed Professor of the Cheung Kong Scholars Program.



**Yongpan Liu** (SM'15) received the B.S., M.S., and Ph.D. degrees from the Electronic Engineering Department, Tsinghua University, Beijing, China, in 1999, 2002, and 2007.

He has been a Visiting Scholar with Pennsylvania State University, State College, PA, USA, in 2014. He is currently a Key Member of the Tsinghua-Rohm Research Center, Beijing, and the Research Center of Future ICs, Beijing. He is also an Associate Professor with the Department of Electronic Engineering, Tsinghua University. He has

authored over 60 peer-reviewed conference and journal papers and led over six chip design projects for sensing applications, including the first nonvolatile processor (THU1010N). His research is supported by the NSFC, 863, 973 Program and Industry Companies, such as Huawei, Rohm, and Intel. His current research interests include nonvolatile computation, low-power VLSI design, emerging circuits and systems, and design automation.

Dr. Liu is an ACM and IEICE Member. He received the Design Contest Awards from the International Symposium on Low Power Electronics and Design (ISLPED) in 2012 and ISLPED in 2013 and the Best Paper Award HPCA2015. He served on several conference technical program committees, such as DAC, Design, Automation & Test in Europe, Asia and South Pacific Design Automation Conference, ISLPED, Asian Solid-State Circuits Conference, International Conference on Computer Design, and International Conference on VLSI Design.



**Meng-Fan Chang** (M'05–SM'14) received the M.S. degree from Pennsylvania State University, State College, PA, USA, and the Ph.D. degree from National Chiao Tung University, Hsinchu, Taiwan.

He has been with industry over ten years. From 1996 to 1997, he was with Mentor Graphics, NJ, USA, where he designed memory compilers. From 1997 to 2001, he with the Design Service Division (DSD), Taiwan Semiconductor Manufacturing Company, Hsinchu, where he designed embedded SRAMs and Flash. From 2001 to 2006, he was a Co-Founder and the Director of IPLib Company, Taiwan, where he developed embedded SRAM and ROM compilers, Flash macros, and flat-cell ROM products. He is currently a Full Professor with National Tsing Hua University (NTHU), Hsinchu. His current research interests include circuit designs for volatile and nonvolatile memory, ultra-low-voltage systems, 3-D memory, circuit-device interactions, memristor logics for neuromorphic computing, and computing-in-memory.

Dr. Chang received the Academia Sinica (Taiwan) Junior Research Investigators Award in 2012 and the Ta-You Wu Memorial Award of National Science Council (Taiwan) in 2011. He also received numerous awards from the Taiwan's National Chip Implementation Center, NTHU, MXIC Golden Silicon Awards, and Industrial Technology Research Institute. He is the corresponding author of numerous International Solid-State Circuits Conference (ISSCC), Symposium on VLSI Circuits, International Electron Devices Meeting, and DAC papers. He has been serving on Technical Program Committees for ISSCC, IEDM, Asian Solid-State Circuits Conference, International Symposium on Circuits and Systems, International Symposium on VLSI Design, Automation and Test, and numerous international conferences. He has been a Distinguished Lecture Speaker for the IEEE Circuits and Systems Society. He has been serving as the Associate Executive Director for Taiwan's National Program of Intelligent Electronics from 2011 to 2016. He is an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, and *IEICE Electronics*.