

**ACCEPTED MANUSCRIPT**

## Device and materials requirements for neuromorphic computing

To cite this article before publication: Raisul Islam *et al* 2018 *J. Phys. D: Appl. Phys.* in press <https://doi.org/10.1088/1361-6463/aaf784>

### Manuscript version: Accepted Manuscript

Accepted Manuscript is “the version of the article accepted for publication including all changes made as a result of the peer review process, and which may also include the addition to the article by IOP Publishing of a header, an article ID, a cover sheet and/or an ‘Accepted Manuscript’ watermark, but excluding any other editing, typesetting or other changes made by IOP Publishing and/or its licensors”

This Accepted Manuscript is © 2018 IOP Publishing Ltd.

During the embargo period (the 12 month period from the publication of the Version of Record of this article), the Accepted Manuscript is fully protected by copyright and cannot be reused or reposted elsewhere.

As the Version of Record of this article is going to be / has been published on a subscription basis, this Accepted Manuscript is available for reuse under a CC BY-NC-ND 3.0 licence after the 12 month embargo period.

After the embargo period, everyone is permitted to use copy and redistribute this article for non-commercial purposes only, provided that they adhere to all the terms of the licence <https://creativecommons.org/licenses/by-nc-nd/3.0>

Although reasonable endeavours have been taken to obtain all necessary permissions from third parties to include their copyrighted content within this article, their full citation and copyright line may not be present in this Accepted Manuscript version. Before using any content from this article, please refer to the Version of Record on IOPscience once published for full citation and copyright details, as permissions will likely be required. All third party content is fully copyright protected, unless specifically stated otherwise in the figure caption in the Version of Record.

View the [article online](#) for updates and enhancements.

## Device and Materials Requirements for Neuromorphic Computing

Raisul Islam<sup>1</sup>, Haitong Li<sup>1</sup>, Pai-Yu Chen<sup>2</sup>, Weier Wan<sup>1</sup>, Hong-Yu Chen<sup>3</sup>, Bin Gao<sup>4</sup>, Huaqiang Wu<sup>4</sup>, Shimeng Yu<sup>5</sup>, Krishna Saraswat<sup>1</sup>, H.-S. Philip Wong<sup>1</sup>

<sup>1</sup>Department of Electrical Engineering, Stanford University, 420 Via Palou Mall, Stanford, CA 94305, USA

<sup>2</sup>School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287, USA

<sup>3</sup>GigaDevice Semiconductor Inc., A12, USTB Techart Plaza, Xueyuan Road 30, Haidian District, Beijing, China

<sup>4</sup>Institute of Microelectronics, Tsinghua University, Beijing, 100084, China

<sup>5</sup>School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

raisul@stanford.edu, hspwong@stanford.edu

Energy efficient hardware implementation of artificial neural network is challenging due the "memory-wall" bottleneck. Neuromorphic computing promises to address this challenge by eliminating data movement to and from off-chip memory devices. Emerging non-volatile memory devices that exhibit gradual changes in resistivity are a key enabler of in-memory computing - a type of neuromorphic computing. In this paper, we present a review of some of the non-volatile memory devices (RRAM, CBRAM, PCM) commonly used in neuromorphic application. The review focuses on the trade-off between device parameters such as retention, endurance, device-to-device variation, speed and resistance levels, and the interplay with target applications. This work aims at providing guidance for finding the optimized resistive memory devices material stack suitable for neuromorphic application.

## 1 Introduction

First coined by Carver Mead in 1990 [1], the term "neuromorphic computing" refers to a computing paradigm inspired by the cognitive functionality of human brain. In today's data-centric world, where some of the most useful computing tasks are to extract meaningful information from massive amount of unstructured data, neuromorphic computing can provide low-energy high throughput computing. The challenge in data-centric intelligent computing with the conventional computing architecture lies in the energy and latency bottleneck of off-chip memory access (i.e. "memory wall") which do not scale down with the scaling of the technology node [2]. To overcome this problem, new in-memory computing paradigm has been proposed [3]–[13] for accelerating deep neural networks (DNNs) used in data-centric computing. In-memory computing can utilize certain properties of the emerging non-volatile memory devices such as gradual switching of resistance values with constant voltage pulse train. Besides application-oriented accelerator hardware for neural networks, neuromorphic computing may also aim at emulating brain-like learning behavior (e.g. spike timing dependent plasticity (STDP)) in electronic systems. Conventional hardwares like CPUs and GPUs that emulate brain-like functionality is not energy efficient [14]. As an alternative, resistive memory as synaptic connection between two neurons is promising for brain-inspired computing. Such non-volatile memories with multiple levels of

1  
2  
3 resistance states can be easily integrated on-chip that can be used as an analog weight storage  
4 reducing the memory access overhead. Alternatively, it can facilitate certain processing tasks to  
5 be performed in memory resulting in further reduction of memory access overhead at lower energy  
6 cost [15].  
7

8 Analog-programmable non-volatile memory devices such as resistive RAM (RRAM),  
9 conductive bridging RAM (CBRAM), phase change memory (PCM), magnetic RAM (STT-  
10 MRAM) lie at the heart of such neuromorphic computing devices. A fundamental device element  
11 having resistive memory, termed as ‘memristor’, has been theorized by Chua et al [16]. Later  
12 Strukov et al. proposed that Pt/TiO<sub>2-x</sub>/Pt resistive switching devices are the physical embodiment  
13 of memristors [17]. Although these works had significant impact on the field of non-volatile  
14 memory devices for neuromorphic computing, later it was shown that the typical resistive memory  
15 devices (eg. RRAM, CBRAM, PCRAM) are not equivalent to the memristors with respect to its  
16 working principle [18] theorized by Chua et al [16]. Non-volatile memory devices were originally  
17 developed as digital memories which could be used as on-chip memory or non-volatile data  
18 storage. However, one important capability of these devices is the multi-bit capacity where instead  
19 of two resistance levels, multiple levels can be encoded to multi-bit information. This gradual  
20 switching of the resistance levels in these devices are the key to neuromorphic applications. While  
21 extensive reviews of the emerging non-volatile memory devices for storage-class memory  
22 application exist in the literature, a comprehensive review of the devices and materials  
23 requirements and possible trade-offs for neuromorphic application is missing. Note that resistive  
24 memory devices based on organic materials fall into a different class of devices suitable for  
25 neuromorphic computing. These devices are still not matured enough to be readily available for  
26 commercial technology, yet they show promising characteristics like excellent capability of analog  
27 tuning, linearity in conductance and extremely low energy for switching. Detailed review of the  
28 state-of-the-art of such devices is presented elsewhere [19] and is out of scope for this paper.  
29 Hence, the goal of this paper is to present a review of inorganic materials based non-volatile  
30 memory devices for neuromorphic application. Two similar yet broad review on the relevant topic  
31 was done by Burr et al. [20] and Yu et al [21]. This review is more focused towards device trade-  
32 offs for hardware artificial neural networks exploiting in-memory computing principles.  
33

34 The paper is organized in three main sections. First, we explain the challenges of state-of-  
35 the-art hardware accelerators present in literature and show how non-volatile memory devices  
36 could be useful in such systems. Then we provide a review of the various non-volatile memory  
37 technologies that have already been demonstrated for this application. Finally, we discuss the  
38 possible device trade-offs in designing neuromorphic hardware.  
39

## 40 2 Overview of Neuromorphic Computing

41

42 Neuromorphic computing can be broadly classified into two categories: (a) biology based  
43 models/algorithms which are based on studying the learning and inference process of the human  
44 brain and emulating those functionalities in hardware and (b) artificial neural networks (ANNs)  
45 which are algorithms to solve machine learning problems inspired by the brain to some extent  
46 (network layers are constructed by the connections between neurons termed as synapses) but does  
47 not necessarily have a direct correlation with brain functionality. The human brain consists of  
48 neurons which are interconnected by a highly complex network of synapses. Each neuron is  
49 connected to multiple neurons through synapses. Neurons generate action potentials (spikes) that  
50 are transmitted to the other connected neurons through synapses. The communication between the  
51 neurons is controlled by the synapses which modify the strength of the connection based on the  
52 previous activity. This process is called plasticity and is responsible for learning and memory formation.  
53

neurons through spiking signals results in the modification of synaptic connection strength. This synaptic strength modulation forms the basis for learning that can be emulated in hardware (class (a)). For example, one such learning paradigm is known as spike timing dependent plasticity (STDP) [22], where each neuron integrates all the incoming action potential and when the integrated signal exceeds a certain threshold, it fires a spiking pulse that contributes to the learning by changing the synaptic connection strength based on the timing of pre-syanptic and post-synaptic pulse. STDP can be a “local” learning mechanism which is only applicable in emulating brain-like behavior. Implementation of such learning models in hardware originating directly from understanding the brain’s learning mechanism has been studied and reviewed extensively by Kuzum et al. [15]. STDP’s can also be viewed as a “global” learning mechanism that requires weight updating (via for instance error backpropagation) for neuromorphic computing applications. Our discussion in this paper will be focused on hardware acceleration of artificial neural networks using emerging non-volatile memory devices. The ANNs are inspired from the human brain’s neural connectivity, yet do not correlerate to any specific biological learning model. Fig. 1 shows the development of neuromorphic computing and its categories. This paper will focus on the highlighted square of the design paradigm.

### 3 Hardware Acceleration for Neural Network

Deep neural network (DNN) is a class of artificial neural networks (ANNs) that features a considerable increase in the network depth to build richer representations of the input data. DNNs have been gaining great momentum for tackling large-scale, perceptual tasks such as computer vision and natural language understanding. This secion picks DNN as a case study, among many variants of ANNs, to illustrate the close interaction between the development of hardware primitives and neural networks. DNNs have been benefiting from both the availability of big data (large amount of multi-media data for model training) and the large performance improvement of computing hardware in the past decade. Recent development of deep neural networks (DNNs) features an increase in both the model size (defined as the amount of static weights after training of a neural network) and computational complexity (feedforward and backward) [23]–[27], to meet the requirements of demanding tasks such as video processing [28]. Many recent general-purpose hardware platforms (e.g., CPU and GPU) employ special features such as vector instruction [29] and mixed-precision operations [30] to improve parallelism for DNN inference and training. However, the memory hiarachy design of these general-purpose architectures is not specifically designed to leverage the predictable dataflow and potential data reuse of DNN processing. Therefore, a large portion of memory access goes to the slower and more energy-consuming levels of memory hiarachy (e.g., one off-chip DRAM main memory consumes much more energy than local and small register files per access), limiting the compute throughput and energy efficiency of DNN processing. To reduce these expensive memory access, hardware accelerators for DNNs are designed to employ more fine-grained local memory hiarachy and more specialized dataflow design, which improves the energy efficiency and throughput while maintaining DNN’s inference accuracy. Modern DNNs typically consist of convolutional (CONV) layers and fully-connected (FC) layers as trainable layers (containing weight parameters), interwoven with pre-defined non-linear activation, normalization, pooling and regularization layers that are typically not compute-intensive. However, both CONV and FC layers require intensive multiply-and-accumulate (MAC) operations during both feedforward and back-propagation computation. These MAC operations performed for millions of weight parameters in a DNN impose a stringent requirement on the

efficiency of memory access. Therefore, a common target of the state-of-the-art DNN accelerator designs are two-fold: accelerating the multiply-and-accumulate (MAC) operations while minimizing the energy cost of data movement. In this section, we will first review the design and optimization methodologies of DNN accelerators. Then, the architectural implications for the use of new memory technologies in this context will be discussed.

### 3.1 Design and Optimization Methodologies

Modern DNNs are both computation-intensive and memory-intensive. As seen in some popular DNN architectures, the total number of weights is in the order of tens or hundreds of millions, while the total number of MAC operations during inference can be two to three orders of magnitude larger [23], [24], [26], [27]. For instance, a ResNet-50 network trained on the ImageNet dataset can contain over 20 M weights and require about 4G MAC operations [23]. Performing inference for a batch of 16 images using ResNet-50 on two Intel Xeon E5-2630 v3 processors takes more than 6.6 seconds to complete [31]. Associated with every single MAC operation during the inference phase of a DNN, there are several memory accesses on weight data, activation data, and partial-sum data before and after the computation. These memory accesses can be rather inefficient with general-purpose architectures as a substantial portion goes to relatively slow and energy-hungry, off-chip DRAM. Therefore, to address this memory wall issue and to minimize the data fetching/movement costs, the first methodology that DNN accelerators have taken is to use spatial architectures, which consist of distributed arithmetic logic units (ALUs), localized (yet capacity-limited) memories (e.g., register files, local buffers), and an on-chip network that enables direct communication between ALUs. Some of the early examples include neuFlow [32] and DianNao [33]. The former design uses local registers to store frequently-accessed weights for each MAC unit, while the latter uses scratchpad SRAMs to store weights and intermediate inputs/outputs. In addition to minimizing the energy of reading weights from memories, ShiDianNao (one of the successors of DianNao) [34] is designed to minimize memory write accesses by grouping the MAC outputs from adjacent ALUs before writing back to SRAMs. The strategies employed in [32]–[34] can be summarized as minimizing read and write accesses by handling, caching, and processing reusable data in a DNN computation flow, and can be seen in several other reports of DNN accelerators as well [35], [36]. Most recently, Eyeriss combines these two strategies to further improve the data reuse, by efficiently compiling and mapping DNN parameters from DRAM into scratchpad SRAMs and local registers [8]. As neural networks can be viewed as arbitrary function approximators from an algorithm perspective, weight precision reduction and network pruning may be used to compress large DNN models and yield smaller models that can better fit hardware constraints during deployment [7]. Some DNN accelerator designs have exploited this methodology by mapping compressed DNN models to reduce energy and area costs of high-precision arithmetic. As a result, these compressed models typically require less storage and compute resources on hardware. For example, EIE [7] and SCNN [12] are inference accelerators that use network pruning technique, which takes redundant weight parameters and set them to zero. EIE is designed to perform computation on the sparse representation after pruning FC layers, while SCNN focuses such sparse processing on CONV layers. Google's™ Tensor Processing Unit (TPU) reduces the precision to 8-b integer arithmetic [37], while some other accelerators explore even less number of bits to improve throughput and energy efficiency, including ternary/binary representations [3], [38].

### 3.2 Architectural Implications for Memory Technologies

Present accelerator designs put a central emphasis on the memory hierarchy optimization and its interplay with on-chip computation resources. However, the "memory bottleneck" in modern DNNs may not be fully addressed by the aforementioned acceleration architectures alone. In fact, memory access remains to be the bottleneck for many DNN inference workloads when deployed on accelerator hardware, especially for networks mainly consisting of fully-connected layers, such as multilayer perceptron (MLP) and Long Short-Term Memory (LSTM) [37]. Moreover, one can expect future DNNs to grow rapidly in network depth and computational complexity. As an example, DNNs with convolutional layers for image applications have grown from 8 layers (AlexNet [24]) to over 100 layers (ResNet [23]) to be able to handle rich information in natural images. The state-of-the-art YOLO network [39] for real-time object detection involves computations on many small grids of a single image or video frame, which implies the growing need for more data-intensive, fine-grained multi-media processing. Thus, in the future, real-time processing of high-resolution videos would require even more hardware computing capabilities to handle parallel processing with large DNNs and the concurrent memory accesses with as little bandwidth limitation as possible. Contemporary accelerator designs still face the memory bandwidth and capacity wall, as the typical on-chip registers and SRAM buffers can only provide KB- to MB-scale data memory [8], [33], [37], which is much smaller than off-chip DRAM capacity. This has driven several accelerator works towards using alternative memory technologies. For instance, DaDianNao [6], another successor of DianNao, uses 36-MB/chip embedded DRAM (eDRAM) to provide slightly larger on-chip storage capacity compared to SRAM. However, such approach may not have good scalability, due to the added cost of eDRAM technology and limited benefits for on-chip storage capacity. For state-of-the-art node (14 nm), in high volume manufacturing ~70% array efficiency has been demonstrated for on-chip SRAM [40]. If ~80% of the chip is SRAM macros in futuristic nodes (~7 nm) where SRAM bit cell area is  $0.027 \mu\text{m}^2$  [41] a typical die ( $815 \text{ mm}^2$  – NVIDIA V100 [42]) could accommodate ~2GB of SRAM in future. 2GB of SRAM sounds sufficient to hold most of today's DNN weights on-chip, however, in this case, the standby leakage of the SRAM array may dominate the entire chip's power consumption, which makes it unpractical. Considering the memory wall faced by the modern DNNs, emerging memory technologies may play an important and unique role. The candidates that are being actively investigated by the device and material communities include PCM [43], [44], RRAM [45], [46], CBRAM [47], and STT-MRAM [48]. As these technologies can potentially offer up to tera-bytes of on-chip data storage with a wide range of energy-delay optimization opportunities, they may complement SRAM for more efficient DNN inference acceleration. Architecture studies, through simulations, have shown that RRAM crossbar arrays can provide MAC processing capability and on-chip data storage at the same time [9], [10]. These studies use the structural parallelism and current summation properties, but do not fully exploit the analog programmable properties of resistive-type non-volatile memories. Thus, there is an even larger design space with emerging memory technologies that can be exploited as a key compute and storage component for efficient hardware implementations of DNNs. The following sections will address this topic in detail.

### 3.3 Non-volatile Memory as Analog Synaptic Weights in Neural Networks

A possible application of emerging non-volatile memory (NVM) devices is to serve as in-memory computing element where multi-level resistance response of an NVM can store the analog synaptic weights of a DNN on-chip. After reading these analog weights, conventional hardwares can perform the typical arithmetic operation. These schemes bring the memory closer to the computing element but the computation is not done inside the memory. In another in-memory computing scheme, a crossbar array of non-volatile memory devices can perform the multiply-and-accumulate (MAC) operation at a lower energy cost when the input vector is encoded as an analog voltage and the weight matrix is encoded as analog resistance (conductance) values stored in the memory devices. Fig. 2 shows the typical mathematical abstraction of a single layer perceptron. If the input vector is encoded as an analog voltage and the weight matrix can be encoded as the conductance values in a resistive memory array (Fig. 2b), the output current represents the MAC operation. The ability of the non-volatile memory devices like RRAM, PCM, CBRAM to change its resistance values gradually as a function of the applied voltage pulse across its electrode is the key to performing analog in-memory MAC operation (Fig. 2). However, if the NVM device has non-linear I-V curve (which is typically the case in higher resistance state), using analog voltage as input will cause large error due to the variable conductance with the read voltage. A solution is to use an identical pulse train as input where the pulse number represents the input value. Another solution is to encode the input to adapt the NVM's non-linearity. To the best of our knowledge, this has yet to be studied in detail. Weight update can be written as a sum of outer product between two vectors in many machine learning problems (e.g. stochastic gradient descent, contrastive divergence training of Restricted Boltzmann Machine). During the update, write pulses are applied simultaneously across multiple rows and columns. The NVM cells are updated in parallel, with resistance change as a function of the voltages at its corresponding row and column. Different training algorithms exhibit different immunity to weight update non-idealities, and therefore should be studied on a per-case basis. The weight update non-idealities can affect both final training results and training convergence speed. Usually, the deterministic effects such as weight update non-linearity and dynamic range have more impact on the final training accuracy. Stochastic effects such as device-to-device and cycle-to-cycle variations (when not too large) sometimes exhibit correlation with convergence speed. Section 4 provides a literature review of the current state-of-the-art non-volatile memory devices used for neuromorphic hardwares in applications ranging from biology based learning models to conventional machine learning algorithms solved using neural networks. Section 5 and 6 provide a more focused overview of the device-level trade-offs required for hardware acceleration of neural network architectures using analog in-memory MAC operation.

## 4 Review of the State-of-the-art Devices

This section provides an overview of the emerging non-volatile memory (NVM) technologies that has been utilized as analog synaptic weights in neural networks. Inferencing and online learning requires separate set of characteristics from the NVM devices and they will be discussed separately. The desired properties for a NVM device to be used as analog synaptic weight in neural networks facilitating multiply-and-accumulate (MAC) operation for inferencing are - large dynamic range of resistance with high ( $100\text{ k}\Omega$  -  $1\text{ M}\Omega$ ) value of low resistance state, high dynamic range of resistance change when programmed with identical pulses in both SET and RESET process, large numbers of distinguishable resistance levels and CMOS logic compatible switching voltage. For online learning, where the weights are updated often, retention is not a big

concern but high endurance is desired along with nanoscale switching. For offline inference, where weight is updated occasionally using off-chip learning, good retention characteristics is also required. Any single device has yet to demonstrate all the desired properties. In this paper, we provide a brief review of the three most promising non-volatile memory technologies as they are being utilized in neuromorphic applications.

## 4.1 Resistive Random Access Memory (RRAM)

Among different emerging NVM technologies, the main advantages of using RRAM for neuromorphic applications, specially for MAC operation for neural networks are scalability, moderate switching speed, and low energy consumption. The main challenge for RRAM is to achieve CMOS compatible switching voltage and high endurance. Moreover, the switching, specially the SET operation, is abrupt and makes it difficult to achieve gradual resistivity control by repeated application of the same programming pulse. While it is possible to get gradual RESET operation, RRAMs suffer from non-linearity in switching both during SET and RESET. Also, asymmetry is observed while switching between SET to RESET and RESET to SET. This inherent non-linearity and asymmetry in the switching of these devices have a negative impact on the accuracy of the neural network (NN) [49]–[51]. The other challenging issue in designing NN with RRAM is device-to-device and cycle-to-cycle variation. While some cycle-to-cycle variation can be tolerated in inferencing, it is good to have low device-to-device variation for large arrays. The trade-off between different design constraints and how these impacts the learning and inference accuracy will be discussed in detail in section 5. This section provides an overview of the state-of-the-art RRAM devices utilized for NN application.

Metal oxide RRAM is a simple metal-insulator-metal (MIM) structure, where the insulator layer is typically a transition metal oxide. The metal oxide layer can be a single switching layer or be composed of multiple layers where the interfacial layers are engineered to have the desired properties. Metal oxide RRAMs are also known as valence change memory (VCM), because the resistive switching happens because of the movement of oxygen vacancy defects. These are anion-based memory devices. The other type of RRAM is a cation-based memory device where switching happens because of metal cations diffusion from the anode metal contacts to the solid electrolytes, also known as electrochemical metallization (ECM) cells, will be discussed in the next sub-section. These types of cells where the metal cations form a conductive bridge type filament are also termed as “Conductive Bridging Random Access Memory (CBRAM)”.

The physics of RRAM devices have been explained by a variety of switching mechanisms, and they have been investigated extensively by the research community [52]–[58]. The details of the switching mechanism are still an active area of research. The most common switching mechanism is filamentary switching. Here, the set process from the high resistance state (HRS) of the pristine oxide involves soft breakdown of the dielectric material creating a filamentary current conduction path of oxygen vacancy resulting in low resistance state (LRS). The reset process is the switching of the LRS state to the HRS state by recombination of oxygen vacancies with oxygen ions migrated from the electrode/oxide interfacial reservoir upon reversing the bias conditions of the electrodes as compared to the set state. Fig. 3 shows the schematic of the resistive switching mechanism for a binary oxide-RRAM.

For RRAMs to be used in neural networks as weight storage, it is often desirable to be able to store analog values, essentially an extreme case of multi-bit operation of a memory, akin to a multi-bit cell (MLC) with many more levels than currently implemented (typically 2- and 3-bit per

cell is used for digital non-volatile memories). Numerous RRAM oxide materials have been shown to be capable of multi-bit operation e.g.  $\text{Cu}_x\text{O}$  [59],  $\text{TiO}_x$  [60],  $\text{HfO}_x$  [61],  $\text{WO}_x$  [62] and  $\text{TaO}_x$  [63]. One of the early works that demonstrated multi-bit operation was for a  $\text{TiO}_x$  RRAM [60] where 5 levels of resistance states was achieved by varying the amplitude of 5 ns voltage pulses. The data retention was 256 h at 85 °C but the endurance was only  $2 \times 10^6$  cycles. Lee et al. has also shown 5 resistance levels without verification for  $\text{TiN}/\text{TiO}_x/\text{HfO}_x/\text{TiN}$  structure [61]. For the set process multi-level LRS is obtained by changing the set current compliance which modulates the filament diameter or the number of filaments. This compliance dependent multi-level resistance states that results from the modulation of filament size is explained in detail by Chae et al. [64] and Zhao et al [65]. For the reset process, multi-level HRS is obtained by controlling the reset stop voltage. Using Ti as the oxygen scavenging layer, this structure provides moderately fast operation at 5 ns. The retention is 10 years at 200 °C. While the endurance of  $10^6$  cycles is enough for training a small dataset as MNIST [66], it is not sufficient for large scale networks with many training examples. Lee et al. reported one of the highest endurance ( $10^{12}$  cycles at 10 ns switching speed) in a memory device made from asymmetric  $\text{Ta}_2\text{O}_{5-x}/\text{TaO}_{2-x}$  bilayer structure [67]. Controlling the resistance of the base layer  $\text{TaO}_{2-x}$  is a means to control the device resistance. However, the switching voltage is rather high ( $V_{\text{set}} = -4.5\text{V}$ ,  $V_{\text{reset}} = +6\text{V}$  at 10 ns pulse) and multi-level switching is not reported in this work. Nitrogen doping of  $\text{TaO}_x$  switching material has been shown to improve multi-bit operation by reducing both the switching voltage and resistance variability [68]. Misha et al. studied the effect of N doping in  $\text{TaO}_x$  and reported a device with 8 levels of resistive switching [68]. Fig. 4 shows the mechanism of nitrogen incorporation in the oxygen vacancy which confines the filament. Nitrogen doping of  $\text{TaO}_x$  film is reported to reduce the switching variability of voltage and resistance by negating the excess conduction path. This results in the capture of oxygen ion by nitrogen during the bias application that forms the filament confined in a localized region (Fig. 4c). This reduced variability in the filament formation (Fig. 4d) for different compliance current results in higher levels of resistance switching, where the optimized doping results in 8 levels of switching with uniform switching among 50 cycles per level.

The SET operation in filamentary RRAM is inherently abrupt in nature. This results in non-linear conductivity switching with the number of switching pulses, which has a negative impact on the accuracy of machine learning task. RESET on the other hand is more gradual as shown in Fig. 5a for the  $\text{TiN}$  (BE)/ $\text{HfO}_2$ /Ti/ $\text{TiN}$  (TE) device stack [69]. A barrier layer on the bottom electrode of this device is inserted to avoid an abrupt switching, which resulted in a linear gradual SET/RESET process. Fig. 5c shows the comparative synaptic behavior observed from a  $\text{TiN}$  (BE)/ $\text{HfO}_2$ /Ti/ $\text{TiN}$  (TE) and an  $\text{Al}$  (BE)/ $\text{AlO}_x/\text{HfO}_2$ /Ti/ $\text{TiN}$  (TE) device. In the bilayer system, there is a difference in oxygen vacancy mobility between two layers. During the RESET process the dissolution of the vacancy is limited by the  $\text{AlO}_x$  layer because of the lower mobility of oxygen vacancy. Instead the conductance of the conductive filament (CF) is modulated by the width of the filament (Fig. 5b). This results in gradual resistive switching at the expense of low on/off ratio because the width modulation of the filament changes the resistivity according to ohms law compared to the case of tunnel barrier modulation in the length direction, which has an exponential relation with the current. Pattern recognition accuracy increases from 20% for  $\text{HfO}_2$  device to close to 90% for the bilayer device.

Wu et al. proposes that abrupt switching in  $\text{HfO}_x$  can be explained by the positive feedback of electric field on the formation of conductive filament (CF) which accelerates the formation of one single dominant CF [70]. The formation and rupture of one dominant filament contributes to the total conductance change by a significant amount resulting in an abrupt switching behavior. A

transition from the abrupt switching to the analog switching is found at higher temperature by confining heat in the switching layer using a thermal enhanced layer (TEL) [70]. Confining heat in the switching layer allows the oxygen vacancies to redistribute themselves uniformly. This results in the formation of multiple weak CFs instead of one dominant filament. This results in a better analog switching behavior as shown in Fig. 6, where more than 10 times switching window is demonstrated for 50 ns switching pulse.

Amorphous Si (a-Si) barrier layer has been shown to work as an oxygen scavenging layer introducing significant oxygen vacancy in the switching layer ( $TiO_2$ ) [71]. This results in analog non-filamentary switching with better device to device uniformity than  $AlO_x$  barrier layer. However, the switching voltage is relatively large (~6V) because of relatively thicker a-Si which causes large voltage drop across it.

Besides material innovation for improved analog switching, three-dimensional device architecture is another important research direction because it provides the advantage of area scaling and increased functionality. 3-D vertical RRAM (VRRAM) has been demonstrated by several groups (typical structure shown in Fig. 7) [72]–[77]. Using 3D VRRAM, Li et al. introduced a brain-inspired computational framework capable of one-shot learning known as hyperdimensional (HD) computing [78]. Due to the energy efficient VRRAM cells and dense connectivity, this architecture reduces total energy consumption by 52.2% having 412 times less area compared to a low-power digital design using registers as memory. Moreover, this architecture is resilient to RRAM endurance failure because of device-architecture co-optimization.

RRAM arrays have been used successfully for various machine learning tasks. Park et al. proposed PCMO (the device stack Pt/ $AlO_x$ / $TiN_x$ / $Pr_{0.7}Ca_{0.3}MnO_3$ /Pt from top to bottom) based RRAM synaptic device which exhibits the necessary gradual and symmetric conductance change [79]. Using a single layer perceptron of 192 synapses, this device array can learn and recognize human thought pattern corresponding to three vowels from EEG signals. Prezioso et al. demonstrated transistor-free (1R type) metal oxide RRAM device array crossbars to allow integrated operation of neural networks [80]. The bilayer device stack Pt/Ti/ $TiO_{2-x}$ / $Al_2O_3$ /Pt is used for an integrated crossbar array of  $12 \times 12$  devices. This single layer perceptron network can be taught to perform the perfect classification of  $3 \times 3$ -pixel black and white images into three classes. L. Gao et al. demonstrated a convolution kernel operation (i.e., edge detection) on a MNIST image using a  $12 \times 12$  crossbar array with  $HfO_x$  RRAM [81]. A recent work by Yao et al. demonstrated grey-scale human face classification using  $128 \times 8$  array with parallel on-line training [82]. The network designed with optimized metal oxide device stack of  $TiN/TaO_x/HfAlO_x/TiN$  consumes 1000 times less energy than an implementation of the same network using an Intel Xeon Phi processor with an off-chip weight storage. While these demonstrations use NVM as the synaptic device, all of these use circuitry external to the NVM (either in software or in hardware). None of these have NVM integrated with the peripheral control circuits.

## 4.2 Conductive Bridging Random Access Memory (CBRAM)

Filamentary resistive switching devices where the filament is composed of metal cations instead of oxygen vacancies are termed as "Conductive Bridging RAM" or CBRAM. The structure of CBRAM devices consists of one electrochemically active electrode (e.g. Ag or Cu that is oxidized easily under an external positive bias) and one electrochemically inert electrode (e.g. Pt, Ir, Au, W, TiN). The switching material between these two electrodes can be a solid electrolyte

(chalcogenides) or an oxide material. The first CBRAM-like switching device was proposed by Hirose et al. [83] in 1976 where switching occurred using a Ag dendrite in a Ag doped  $\text{As}_2\text{S}_3$  film in a Ag/ $\text{As}_2\text{S}_3$ /Mo structure. Germanium (Ge) based chalcogenide materials ( $\text{GeSe}_x$  [84],  $\text{GeS}_2$  [85], GeTe [86]) have been widely studied as CBRAM active switching material where Cu and Ag ions show high mobility in the chalcogenide materials. The basic mechanism of switching in CBRAM involves electrochemical reaction at the active anode metal (Ag or Cu) which allows metal to form cations. These cations drift through the solid electrolyte switching layer under the electric field and reduces to metal atoms near the inert electrodes. This process forms a metallic conductive bridge from anode to cathode when the device switches from HRS to LRS (SET), hence the name CBRAM. By changing the polarity of the voltage, an electrochemical dissolution of the conductive bridge occurs that resets the device from LRS to HRS. The growth kinetics depend on the electrode and switching materials; therefore, it varies from oxide to non-oxide switching materials. Besides chalcogenides, oxides are widely used for CBRAM, e.g.,  $\text{SiO}_2$  [87],  $\text{ZrO}_2$  [88],  $\text{Ta}_2\text{O}_5$  [89],  $\text{GeO}_x$  [90],  $\text{TiO}_2$  [91]. Amorphous Si (a-Si) with Ag doping has also been reported [92]. CBRAM usually has low switching voltage (<2 V), fast switching (~ns), high scalability and low power operation [93], [94]. However, the switching is highly stochastic and abrupt in nature. This creates a challenge for MAC operation in NN where gradual and linear conductivity switching is desirable. Also, achieving high endurance and retention is a challenge. The main reason for these challenges is the highly mobile nature of metal cations for which the diffusion barrier is relatively low in the traditional electrolytes. To control Cu or Ag diffusion to improve switching uniformity, bilayer materials, which creates additional cation diffusion barrier, have been studied e.g.  $\text{MoO}_x/\text{GdO}_x$  [95],  $\text{Ti}/\text{TaO}_x$  [96],  $\text{GeSe}_x/\text{TaO}_x$  [97], Cu-Te/ $\text{Al}_2\text{O}_3$  [98], TiW/ $\text{Al}_2\text{O}_3$  [99] and so on.

For example, Aratani et al. demonstrated  $>10^7$  cycle endurance from Cu-Te/ $\text{GdO}_x$  bilayer CBRAM [100]. Four levels of conductive switching were obtained by setting the appropriate compliance current. Precise control of cation injection into the switching layer is the key to improve reliability [100]. Besides the use of a bilayer structure, introducing a transistor in series can also be an effective solution for controlling cation injection. This technique, however, is not suitable for large 2D cross point architecture that is essential for Kirchoff's law type vector matrix multiplier for NN application. Recently, Fujii et al. demonstrated that confinement of the area of the switching layer in a CBRAM type device is a promising way to control Cu injection [101]. Fig. 8 shows that when the  $\text{SiO}_2$  switching layer in Cu/ $\text{SiO}_2/\text{Pt}$  CBRAM is reduced from 100 nm to 30 nm in lateral dimension, endurance is improved by two orders of magnitude. The improvement in endurance originates from providing only a limited supply of Cu ions during the set operation due to the spatial limitation of the Cu top electrode. This prevents excessive Cu ions from moving into the  $\text{SiO}_2$ . It is also reported that reducing the Cu electrode down to sub-20 nm could improve data retention due to the restricted Cu movement within the switching layer.

Using a physical model of the CBRAM, Yu et al. showed that CBRAMs can emulate the function of a biological synapse, exhibiting spike timing dependent plasticity (STDP) behavior, a key observation from biology [102]. One interesting alternative to devices with deterministic multilevel resistance switching is to use devices that show binary switching along with a stochastic-STDP learning rule. This alternative is a functional equivalent with deterministic multilevel synapses at the system-level [103]. Such stochastic binary synapses have been applied to both supervised [104] and unsupervised [105] NN. In this scheme, stochastic switching in resistive memories makes the SET/RESET process probabilistic. The input and the weights of the NN can be converted to a Bernoulli distribution [106] that represents the stochastically switched CBRAM. Formation of one dominant cation filament where the metals have higher diffusivity is

the reason for abrupt switching and variability in CBRAM.

Besides stochastic switching, analog resistance modulation based synaptic device using CBRAM has been shown. Jo et al. proposed a CBRAM device with co-sputtered Ag and Si layer with properly designed Ag/Si mixture ratio gradient that leads to the formation of a Ag-rich (high conductivity) and Ag-poor (low conductivity) region [92]. Ag nanoparticles are embedded into the Si medium that forms a uniform conduction front between Ag-rich and Ag-poor regions. With applied bias, this device shows reliable analog switching behavior having gradual conductance change with subsequent pulses. The analog switching occurs because of the gradual movement of incorporated Ag nanoparticles that allows current conduction through tunneling across Ag nanoparticles as opposed to the formation of a continuous metallic filament. Continuous conductivity modulation as shown in this work for STDP like synaptic operation is also essential for analog weight storage for MAC operation in NN application.

To take the advantage of relatively higher reliability from vacancy-based RRAM along with low voltage operation from CBRAM, Yoon et al. proposed Ag doped  $Ta_2O_5$  resistive switching device with tantalum (Ta) as the top electrode and ruthenium (Ru) as the bottom electrode [107]. This device does not operate as the traditional CBRAM since the TE does not supply the cation Ag, which remains embedded in the oxide. CMOS compatible switching voltage (0.7V) is reported with  $5 \times 10^7$  endurance cycle at 100 ns pulse. The device also shows  $9.936 \times 10^6$  seconds retention at room temperature and electro-forming free operation making it one of the most promising devices for neuromorphic application. Ru as BE plays a special role in lowering the switching current and forming free operation compared to a Pd BE as shown in Fig. 9. There is no mutual solubility between Ag and Ru, resulting in Ru BE repelling Ag atoms away from the BE. This allows Ag to form nanoclusters inside  $Ta_2O_5$  dispersed relatively close to each other resulting in conductive tunneling path (CTP) between the Ag nanoclusters. Unlike CBRAM, there is no continuous cation filament formed here which keeps it forming free. However, in case of Pd BE devices, Ag and Pd can form single uniform phase which makes Ag to be attracted to the BE and get uniformly distributed. This prevents cluster formation. Without the CTP, the switching in Pd BE device is through oxygen vacancies and therefore forming is needed. This work thus exemplifies the need for interface engineering between the electrodes and the switching material to obtain the desired switching performance and the reliability.

### 4.3 Phase Change Memory (PCM)

Phase change memories (PCM) are a class of non-volatile memory devices where large differences in electrical resistivity between amorphous (high-resistivity) and crystalline (low-resistivity) phases of certain materials are utilized to represent memory states. The phase transformation occurs through Joule-heating from the current that drives through the phase change material when a voltage pulse is applied. Resistance modulation of phase change materials can also occur by applying voltage pulses with specific amplitude and duration leading to multiple sizes of the amorphous region of the device having resistances between fully amorphous and crystalline state. This behavior enables multiple resistance level operation of PCM, a feature essential for neuromorphic application.

Chalcogenide type materials are widely used in the current PCM technology as phase change materials because of its strong resistance contrast, fast crystallization and high crystallization temperature. More specifically, GST ( $Ge_2Sb_2Te_5$ ), which is located in the pseudo-binary line between  $GeTe$  and  $Sb_2Te_3$  in phase diagram, is one of the commonly used materials

for memory and synaptic device applications [108]. For PCM devices to be used as a synaptic device, high dynamic range (ratio between high and low resistance states) is desired. Since neuromorphic applications also require gradual changes in device resistance with constant voltage pulse, SET process is suitable for this, where repetitive pulse slowly crystallizes high-resistivity amorphous state resulting in a gradual change in resistivity. However, the RESET process is quite abrupt since “melt and quench” is required for crystalline to amorphous phase transition. Therefore, the SET and RESET resistivity switching for PCM is not symmetric.

PCM is one of the most mature non-volatile memory technologies and therefore has gained a lot of interest from the research community as an electronic synapse in neuromorphic computing systems. Kuzum et al. first demonstrated a single-element phase change electronic synapse with the capability of both the modulation of the time constant and the realization of the different STDP types [109]. Using optical programming, Wright et al. demonstrated arithmetic operation such as addition, multiplication, subtraction and division in PCM devices [110]. Since amorphization of the phase change material is more abrupt and power consuming than crystallization, Suri et al. proposed a “2-PCM” synapse circuit where each synapse is represented by 2 PCM devices connected in complementary way to the post-synaptic neuron (Fig. 10) [111]. One device implements long-term potentiation (LTP, or increase in conductance) and the other device implements long-term depression (LTD, or decrease in conductance), which makes the STDP learning possible using identical crystallization pulses alone. Moreover, the 2-PCM approach also allows us to have both the positive and negative weights. Suri et al. also improved the synaptic characteristics (SET/RESET current reduction and increase in the number of resistance states) of the standard GST based PCM devices using a thin interfacial layer of HfO<sub>2</sub> which increases the dynamic switching range by improving the crystallization kinetics of the GST film where the interfacial layer can lower the activation energy associated with crystallization and amorphization [112].

The 2-PCM synapse approach has been used by Burr et al. in backpropagation training for a three-layer perceptron neural network. In this network, 164,885 2-PCM synapses were used for vector-matrix multiplication [113]. In an experimental demonstration, Eryilmaz et al. employed a Hopfield network consisting of 100 synaptic devices and 10 recurrently connected neurons for implementation of brain-like associative learning [114]. Kim et al. developed a 64k cell (256×256) PCM array with on-chip neuron circuits capable of continuous in-situ learning where a novel 2T1R (two transistors one resistor) circuit performs both leaky integrate-and-fire (LIF) and STDP learning model asynchronously [115].

Not only supervised learning, but also unsupervised learning has been demonstrated using PCM array. Ambrogio et al. demonstrated 1T1R PCM synaptic array for unsupervised learning [116]. Using the circuit and pulse scheme shown in Fig. 11, visual pattern recognition with 2 or 3 fully connected neuromorphic layer has been shown with high accuracy (95.5%). Recently, Sebastian et al. reported that an unsupervised machine-learning algorithm, running on one million phase change memory (PCM) devices, successfully found temporal correlations in unknown data streams [117]. This work uses the linear resistance switching property of the multi-level memory device to solve linear differential equation. These devices utilize PCM crystallization dynamics to perform both computation (detecting temporal correlations between event-based data streams) and storage of the results and can be considered as “computational memory devices.” Application of different non-volatile memory devices for various neuromorphic applications require trade-offs in device performance and reliability metric. The next section will discuss the topic in detail.

The highlight of section 4 is summarized in Table 1 and Fig. 12.

## 5 Design Trade-off in Non-volatile Memory Devices for Different Neural Network Applications

### 5.1 Retention and Endurance

To capture the correlation between electrical parameters of the synaptic device and microscopic factors and to investigate the intrinsic trade-off between different parameters, researchers have developed different Monte Carlo simulation methods for both filamentary and non-filamentary RRAM devices [118], [119]. The simulation by Gao et al. [118], [119] calculates the distribution of electric field, current density and temperature in the local region of the device, where the resistive switching occurs. Then the probability of generation/migration/recovery of the ions or vacancies can be calculated. The calculation is followed by a stochastic dynamic update of the distribution of ions or vacancies. Based on the calculated distribution and evolution of ions or vacancies, the device parameters can also be calculated that can predict the device characteristics.

Non-filamentary RRAM devices, also known as interface switching RRAM devices, are suitable for bi-directional analog switching, but they usually suffer from the retention and speed trade-off [118]. The resistive switching of non-filamentary RRAM is attributed to the change of an interfacial electronic barrier modulated by oxygen vacancy migration. As shown in Fig. 13 and Fig. 14, if the migration barrier of oxygen vacancy is higher, the device is more stable, but also requires more time for programming. In contrast, with a lower migration barrier of oxygen vacancy, the programming speed can be increased, but retention degrades very fast. For most of the cases, high resistance state is the stable state and the resistance of the lower resistance state increases with time. Since non-filamentary RRAM devices were mostly used as analog synapses for online training [73], [80], [120], the research community has aimed at increasing programming speed without considering for retention. Even so, the programming speed was still on the order of micro-second, and the reported references on data retention for multilevel states at high temperature were limited. For PCRAM, the trade-off between programming speed and retention can be achieved by modulating the stoichiometry of the GST material with tungsten dopant [121] or applying a constant voltage via prestructural ordering (incubation) effects [122].

On the other hand, filamentary RRAM devices (including both OxRAM and CBRAM), which have widely been investigated for the use as a digital nonvolatile memory, can have both nano-second programming speed and excellent high temperature retention. This is because the programming process of filamentary RRAM originates from oxygen vacancy generation or oxygen interstitial migration, while the retention degradation is due to the oxygen vacancy diffusion [57], [123]. These different mechanisms have different activation energies and obviate the intrinsic trade-off of retention and speed. However, filamentary RRAM faces another trade-off problem between retention and multilevel switching. In CBRAM, the source of the filament is metal ion interstitial migration. So, the aforementioned conclusions are similar. The only difference may be that the activation energy of metal ions is smaller than that of oxygen vacancies; so the CBRAMs are faster but the retention is worse.

Generally, the connection and rupture of the conductive filament (CF) causes abrupt resistance change, so filamentary RRAM devices are best utilized as binary synapse [124] or single-bit non-volatile memory. Muraoka and Ninomiya et al. proposed a method to make the oxygen vacancies distributing more tightly, forming a single strong CF with high oxygen vacancy

concentration [123], [125]. With this optimization method, oxygen vacancies are not easy to diffuse out from the CF region, and even though some of these oxygen vacancies diffuse out, only a small resistance change will be observed. In this case, retention can be improved significantly. Recently, Gao et al. proposed that analog switching behaviors could be realized on filamentary RRAM separating the oxygen vacancies to different location, forming multiple weak CFs [119]. Each CF only contributes to a small portion of the total conductance of the device. These CFs are not so stable as the CFs in single-CF device, and thus retention degradation can be observed at high temperature. Similar idea of weak CF was demonstrated in CBRAM using Ag doped  $\text{SiO}_2$  [92].

An order parameter was introduced to quantify the distribution of oxygen vacancy [119]. The order parameter is defined as the percentage of vacancy-vacancy neighbored pairs in the whole lattice of switching oxide layer. It can be expressed as  $O_V = 2N_{V-V}/zC_VN$ , where  $N_{V-V}$  is the number of vacancy-vacancy neighbored pairs,  $C_V$  is the concentration of oxygen vacancy,  $N$  is the total number of oxygen sites in the oxide layer, and  $z$  is the coordinate number of lattice. As shown in Fig. 15, if the order parameter is large (ordered state), which means a strong CF is formed and the device cannot show good analog switching, the retention is high. Whereas, if the order parameter is small (disordered state), which means the device is designed for good analog switching and multiple weak CFs may be formed, resistance fluctuation is observed under high temperature baking. To improve the retention, doping method or multi-layered structure were developed to avoid oxygen vacancy diffusion from its original location [126]. However, doping will introduce discrete dopant variations when the device is scaled down to a smaller size.

Endurance is another key parameter for device reliability. Till now, there are few works reporting the endurance of analog switching NVM. For binary switching, which was mainly aimed for use as digital memory, degradation of endurance were extensively investigated [126]. Chen et al. found that there is a tradeoff between endurance and retention [127]. To get better endurance, the oxygen reservoir layer is very important. This layer could control the concentration of oxygen vacancy in the resistive switching layer, avoiding quick loss of oxygen ions. Besides retention and endurance, read disturbance is another impportant reliability parameter [128]. In a neural network, read disturbance dictates how many times of inference process the network can do without refreshing the weights. Continuous reading may change resistance state of the devices and degrade the accuracy of the network. Although there has been no clear conclusion, it is widely accepted that read disturbance is correlated with the retention degradation, and somewhat analogous to a voltage accelerated retention degradation process [129], [130].

## 5.2 Operating Voltage

Reducing the operating voltage is important for lowering the power consumption of the neural network. Specifically, an operating voltage of less than 1 V is essential for CMOS-compatible on-chip integration of neuromorphic devices. The read voltage may also determine the total scale of the network since larger read voltages result in larger read current. In a highly parallel neural network, the large read current through bit lines may limit the array size. If the non-linearity of I-V curve is very large, increasing read voltage will significantly increase the read current. However, due to current fluctuation, the read voltage cannot decrease too much. The current fluctuation mainly comes from the random telegraph noise caused by electron trapping/detrapping and oxygen (ions or vacancies) vibration [119], [131], [132]. Typically, only one or several traps or oxygen vacancies contribute to the current fluctuation, so the amplitude of current fluctuation

is almost independent of the current level. With a small read voltage, current fluctuation contributes a large portion to the read current and may affect the accuracy of the neural network. Therefore, to make the read current more stable, read voltage should be kept at a reasonable range and cannot be too small.

The programming voltage depends on the SET/RESET voltage of the synaptic devices. It should be higher than the threshold (SET/RESET voltage) for switching and cannot be too large to avoid hard breakdown. The voltage-time-dilemma indicates that reducing programming voltage linearly will incur an exponential increase in programming time [53], [57]. The SET/RESET voltage only depends on the synaptic device itself and is usually less than 2 V. But sometimes a barrier layer is designed for nonlinear I-V curve or better reliability. The new layer may take up part of the applied voltage and thus increase the SET/RESET voltage by up to several volts. Meanwhile, it should also be noticed that too small SET/RESET voltage may cause read disturbance issue [130]. If the read voltage is close to the SET/RESET voltage of the device, the resistance may change very fast during the inference process. This discussion is valid for both RRAM and CBRAM.

### 5.3 Resistance Levels and Variability

To study the feasibility of synaptic devices as analog weights on neural networks (NNs), a simulator (NeuroSim) has been developed [133] for a 2-layer multilayer perceptron (MLP) NN with synaptic device properties incorporated into the weights. As shown in Fig. 16, MNIST handwritten digits are used [134] as the training and testing dataset to implement online learning and offline classification. The MLP network topology is 400(input layer)-100(hidden layer)-10(output layer). 400 neurons of the input layer correspond to 20×20 MNIST image (converted to black/white and edge cropped), and 10 neurons of output layer correspond to 10 classes of digits. Such a simple 2-layer MLP can achieve 96~97% in the software baseline. In online learning, the MLP simulator takes into account the synaptic device properties in training the network with images randomly picked from the training dataset (60k images) and classifying the testing dataset (10k images). In offline classification, the network is pre-trained by software, and the MLP simulator only performs the classification with synaptic device properties.

As shown in Fig. 17, several non-ideal synaptic device properties in the simulator is evaluated such as non-linear and noisy weight update, limited weight precision and finite weight range, etc. To analyze the effect of nonlinear weight update, a set of nonlinear curves are defined and labeled with nonlinearity values from 6 to -6 for both the potentiation (weight increase) and depression (weight decrease). The potentiation and depression will not necessarily follow the same trajectory due to the non-linearity of weight update, resulting in the asymmetry with positive non-linearity value for potentiation and negative nonlinearity for depression. Experiments performed by various groups show that the potentiation and the depression have positive and negative nonlinearity, respectively [69], [92], [120], [135]. During the weight update, the device's conductance is tuned within a confined conductance range, and only a finite number of conductance states are available due to the weight precision. Ideally, the lowest conductance state (OFF state) should be low enough to represent the zero weight in the algorithm, making the dynamic range (conductance ON/OFF ratio) sufficiently large. In reality, the ON/OFF ratio is always finite and normally not large enough. Different devices may even observe different ON/OFF ratios if the conductance range has a variation. On top of the nonlinear weight update curves, there are also considerable weight update variations from device to device, and even from

1  
2  
3 pulse to pulse within one device. The effect of device-to-device weight update variation can be  
4 analyzed by introducing the variation into the nonlinearity baseline for each synaptic device, while  
5 the cycle-to-cycle variation refers to as the variation in conductance change at every programming  
6 pulse.  
7

8 To quantify the impact of the aforementioned non-ideal device properties, sensitivity  
9 analyses was performed for online learning and offline classification using the simulator. Fig. 18a  
10 shows the requirement of weight precision. The result suggests that 6-bit weight is required for  
11 online learning, while 2-bit weight is needed for offline classification (at least for MNIST dataset)  
12 and 1-bit weight introduces only a slight degradation. Fig. 18b shows the learning accuracy with  
13 different ON/OFF ratios. Limited ON/OFF ratio ( $<50$ ) will degrade the accuracy of offline  
14 classification. The network may adapt itself to this limited ON/OFF ratio during learning thus the  
15 online learning can tolerate more ON/OFF ratio ( $>10$  is needed). However, the accuracy drop in  
16 online learning is sharper, which is probably because the network will deviate more from its correct  
17 form with both erroneous weighted sum and weight update results. Fig. 18c shows the impact of  
18 weight update non-linearity and asymmetry. The result shows that the asymmetry (positive  
19 potentiation P and negative depression D) is the key factor that degrades the accuracy, and high  
20 non-linearity can be tolerated if P/D have the same polarity. However, for common situations  
21 where P/D is positive/negative, the impact of nonlinearity on the online learning accuracy is very  
22 critical. High accuracy can only be achieved with small nonlinearity ( $<1$ ). For offline  
23 classification, there is no asymmetry/nonlinearity issue as the cell conductance can be iteratively  
24 programmed to the desired value [136]. Variation sensitivity analyses are performed with different  
25 asymmetry and non-linearity values (P/D: positive/negative) in online learning. Fig. 18d shows  
26 the impact of conductance range variation on the learning accuracy. We added the variation (with  
27 standard deviation ( $\sigma$ ) in terms of percentage) on the highest conductance state (ON state) as it  
28 changes the conductance range most. The result shows that the conductance variation does not  
29 degrade the learning accuracy. Instead, it remedies the accuracy loss due to high nonlinearity.  
30 However, an opposite trend can be observed for the device-to-device variation, as shown in Fig.  
31 18e.  
32

33 The amount of device-to-device variation is defined as the standard deviation ( $\sigma$ ) of  
34 nonlinearity. At low non-linearity ( $<1$ ), the accuracy slightly decreases with larger variation. For  
35 the non-linearity  $>1$ , the impact becomes much more prominent. On the other hand, the amount of  
36 cycle-to-cycle variation ( $\sigma$ ) is expressed in terms of the percentage of the entire weight range. As  
37 shown in Fig. 17f, small cycle-to-cycle variation ( $<2\%$ ) can alleviate the degradation of learning  
38 accuracy by high non-linearity. The reason may be attributed to the random disturbance that aids  
39 the convergence of the weights to an optimal weight pattern (i.e. to help the system jump out of  
40 local minima). Thus, synaptic devices with non-linear weight update behavior may perform better  
41 than expected if they exhibit a little noisy weight update. However, too large variation ( $>2\%$ )  
42 overwhelms the deterministic weight update amount defined by the algorithm and thus is harmful  
43 to the learning accuracy. This set of simulations help to define the desired synaptic device  
44 characteristics that enables high online learning accuracy. To summarize, a symmetric and close  
45 to linear weight update with sufficient ON/OFF ratio is critical, while reasonable amount of device-  
46 to-device, cycle-to-cycle variations could be tolerated. As the simulation presented in this section  
47 is generalized and based on by varying the device properties, the analysis is technology agnostic  
48 and the conclusions are valid for any type of resistive memory devices.  
49

## 54 55 6 Perspective on the Device Parameters for Large Scale 56

## Neural Network Architectures

For a broad class of neuromorphic applications, large conductance switching range with linear response for identical switching pulse are desired. There exists an exponential relationship between switching pulse width and pulse amplitude. Low energy switching requirement stipulates that the switching pulse width and pulse amplitude be as small as possible. There has been considerable work done so far in finding the right material combination for the desirable switching characteristics. Based on our review of such devices in section 4 and 5, we have summarized the current state-of-the-art device parameters reported for neuromorphic application in Fig. 19. Fig. 19a shows the switching pulse width as a function of switching pulse amplitude for different RRAM and CBRAM device stacks used for neuromorphic application. Also, Fig. 19b shows the reported conductance range as a function of the pulse amplitude for the corresponding devices. Devices ideally suited to the neuromorphic application should provide large conductance switching range at low pulse amplitude and small pulse width. In Fig. 19a the direction of smaller switching energy is marked with an arrow. The ideal device stack will lie at the corner pointed by the direction of the arrow shown in this figure. Based on this metric, Pt/GeSO/TiN [137], TiN/TaO<sub>x</sub>/Pt [138] and TiN/SiO<sub>x</sub>/TaO<sub>x</sub>/Pt [138] would have been the better choices. But Fig. 18b suggests that these devices show high conductance which is not desirable since a large array of such devices would draw large currents. Also, the range of conductance change is very low. Considering both the figures of merit, the optimum choice would be TiN/HfO<sub>x</sub>/AlO<sub>x</sub>/Pt [139] (data point 9 in Fig 18) which shows two orders of conductance switching at short switching pulse width. Another promising device is HfO<sub>x</sub> device with thermally enhanced layer (TEL) [70] (data point 14 in Fig 18) which ensures fast switching at low voltage. The conductance also is not too high. The range of conductance switching needs to be increased in order to ensure higher precision matrix-vector multiplication for neural network application. Cycle-to-cycle variation limits the number of resistive switching states that also decreases the precision of the matrix-vector multiplication. Simulation suggests that smaller networks can tolerate some device-to-device variation, but in order to scale up the network lower device-to-device variation is desired. One useful capability of non-volatile memory array for in-memory computing is “blind weight update” which saves additional read during the write sequences by not requiring write-verify scheme. To have such capability in an array, besides low cycle-to-cycle variation, highly linear resistance switching response as a function of identical pulses is required. While this is a limitation for inorganic devices, certain organic devices show high linearity [19]. Inorganic device with thermally enhanced layer (TEL) shows a lot of promise in this regard [70].

## 8 Conclusion

The inference and training of today’s state-of-the-art deep neural networks (DNNs) demand extreme energy efficiency beyond general-purpose architecture. General purpose computing architecture cannot provide the optimized dataflow needed to achieve the desired computing throughput at low energy cost for DNNs. Design of specialized hardware accelerator improves the energy efficiency of DNN inference and training by optimizing memory hierarchy and data-flow design, improving parallelism, and leveraging special properties of neural networks such as error-tolerance and sparsity. The use of emerging on-chip NVM provides a path for further improving energy efficiency by performing highly-parallel analog multiply-accumulate and weight update directly inside memory and eliminating data movement. The capability to integrate

1  
2  
3 tera-byte scale memory on chip enables hardware design to keep up with the increasing model size  
4 and computation complexity of DNN models. On-chip integration of memory provides with highly  
5 parallel and high bandwidth, memory access. The inference and training of DNN pose different  
6 sets of requirements on NVM device characteristics. In general, larger conductance range, more  
7 intermediate states, and higher resistance are desirable for both inference and training. For  
8 inference, an ideal device should also have linear I-V relationship and long retention time. For  
9 training, symmetric and linear pulse response, small device-to-device and cycle-to-cycle variation,  
10 and good endurance are crucial. In this paper, we reviewed the state-of-the-art emerging non-  
11 volatile memory devices. None of the devices we have reviewed could simultaneously combine  
12 all these favorable properties. Besides further device engineering, it is crucial for hardware  
13 designer to select proper material stacks and make reasonable tradeoffs depending on the target  
14 application.  
15

16 The switching mechanism in resistive RAM involves oxygen ion movement to and from  
17 oxygen vacancies. Therefore, controlling the oxygen ion movement during pulsed switching in  
18 RRAM can be a promising way to achieve the aforementioned performance goals. Placing an  
19 oxygen ion barrier to make a bilayer RRAM and confinement of the generated heat during  
20 switching have shown significant improvement in analog switching. Better thermal management  
21 in RRAM can also provide filament stability that could improve reliability like retention and  
22 endurance. If the ideal device characteristics can be achieved, the most important aspect of  
23 Kirchoff's law based analog matrix-vector multiplication array using NVMs is that it can provide  
24 ultra-low energy, high throughput computing without compromising bit precision that is currently  
25 missing in the neural network accelerator landscape.  
26

## 27 Acknowledgment

28 This work is supported in part by ASCENT (one of six centers in JUMP, a Semiconductor  
29 Research Corporation (SRC) program sponsored by DARPA), and member companies of the  
30 Stanford SystemX Alliance and the Stanford Non-Volatile Memory Technology Research  
31 Initiative (NMTRI). The authors also acknowledge Beijing Innovation Center for Future Chips  
32 (ICFC), Beijing Municipal Science and Technology Project (Z181100003218001), and NSFC  
33 (61674089, 61674092).

## 34 References

- [1] C. Mead, "Neuromorphic electronic systems," *Proc. IEEE*, vol. 78, no. 10, pp. 1629–1636, 1990.
- [2] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," in *2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, 2014, pp. 10–14.
- [3] K. Ando *et al.*, "BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS," in *2017 Symposium on VLSI Circuits*, 2017, pp. C24–C25.
- [4] T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, "Binary convolutional neural network on RRAM," in *2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)*, 2017, pp. 782–787.
- [5] R. Andri, L. Cavigelli, D. Rossi, and L. Benini, "YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration," *IEEE Trans. Comput. Des. Integr. Circuits Syst.*, vol. 37, no. 1, pp. 48–60, 2018.

- [6] Y. Chen *et al.*, “DaDianNao: A Machine-Learning Supercomputer,” in *2014 47th Annual IEEE/ACM International Symposium on Microarchitecture*, 2014, pp. 609–622.
- [7] S. Han *et al.*, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in *Proceedings of the 43rd International Symposium on Computer Architecture*, 2016, pp. 243–254.
- [8] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 127–138, 2017.
- [9] A. Shafiee *et al.*, “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in *2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)*, 2016, pp. 14–26.
- [10] P. Chi *et al.*, “PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory,” in *2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)*, 2016, pp. 27–39.
- [11] S. Angizi, Z. He, F. Parveen, and D. Fan, “RIMPA: A New Reconfigurable Dual-Mode In-Memory Processing Architecture with Spin Hall Effect-Driven Domain Wall Motion Device,” in *2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, 2017, pp. 45–50.
- [12] A. Parashar *et al.*, “SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks,” in *Proceedings of the 44th Annual International Symposium on Computer Architecture*, 2017, pp. 27–40.
- [13] D. Fan and S. Angizi, “Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM,” in *2017 IEEE International Conference on Computer Design (ICCD)*, 2017, pp. 609–612.
- [14] S. B. Eryilmaz, D. Kuzum, S. Yu, and H. S. P. Wong, “Device and system level design considerations for analog-non-volatile-memory based neuromorphic architectures,” in *Technical Digest - International Electron Devices Meeting, IEDM*, 2015.
- [15] D. Kuzum, S. Yu, and H. S. Philip Wong, “Synaptic electronics: Materials, devices and applications,” *Nanotechnology*, vol. 24, no. 38. 2013.
- [16] L. O. Chua, “Memristor—The Missing Circuit Element,” *IEEE Trans. Circuit Theory*, vol. 18, no. 5, pp. 507–519, 1971.
- [17] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found,” *Nature*, vol. 453, no. 7191, pp. 80–83, May 2008.
- [18] S. Vongehr and X. Meng, “The missing memristor has not been found,” *Sci. Rep.*, vol. 5, no. 1, p. 11657, Dec. 2015.
- [19] Y. van de Burgt, A. Melianas, S. T. Keene, G. Malliaras, and A. Salleo, “Organic electronics for neuromorphic computing,” *Nat. Electron.*, vol. 1, no. 7, pp. 386–397, Jul. 2018.
- [20] G. W. Burr *et al.*, “Neuromorphic computing using non-volatile memory,” *Adv. Phys. X*, vol. 2, no. 1, pp. 89–124, Jan. 2017.
- [21] S. Yu, “Neuro-Inspired Computing With Emerging Nonvolatile Memorys,” *Proc. IEEE*, vol. 106, no. 2, pp. 260–285, Feb. 2018.
- [22] M. Hartley, N. Taylor, and J. Taylor, “Understanding spike-time-dependent plasticity: A biologically motivated computational model,” *Neurocomputing*, vol. 69, no. 16, pp. 2005–2016, 2006.
- [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, 2016.
- [24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in *Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1*, 2012, pp. 1097–1105.
- [25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” *Nature*, vol. 521, p. 436, May 2015.
- [26] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” *CoRR*, vol. abs/1409.1, 2014.
- [27] C. Szegedy *et al.*, “Going deeper with convolutions,” in *2015 IEEE Conference on Computer*

- 1  
2  
3      *Vision and Pattern Recognition (CVPR)*, 2015, pp. 1–9.
- 4 [28] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-Scale Video  
5 Classification with Convolutional Neural Networks,” in *Proceedings of the 2014 IEEE Conference  
6 on Computer Vision and Pattern Recognition*, 2014, pp. 1725–1732.
- 7 [29] “Intel Architecture Instruction Set Extensions Programming Reference,” Santa Clara, CA, 2017.
- 8 [30] L. Durant, O. Giroux, M. Harris, and N. Stam, “Inside Volta: The World’s Most Advanced Data  
9 Center GPU,” *NVIDIA Developer Blog*, May-2017.
- 10 [31] “CNN-Benchmarks.” [Online]. Available: <https://github.com/jcjohnson/cnn-benchmarks>.
- 11 [32] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s Mobile Coprocessor  
12 for Deep Neural Networks,” in *2014 IEEE Conference on Computer Vision and Pattern  
13 Recognition Workshops*, 2014, pp. 696–701.
- 14 [33] T. Chen *et al.*, “DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous  
15 Machine-learning,” in *Proceedings of the 19th International Conference on Architectural Support  
16 for Programming Languages and Operating Systems*, 2014, pp. 269–284.
- 17 [34] Z. Du *et al.*, “ShuDianNao: Shifting vision processing closer to the sensor,” in *2015 ACM/IEEE  
18 42nd Annual International Symposium on Computer Architecture (ISCA)*, 2015, pp. 92–104.
- 19 [35] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based Accelerator  
20 Design for Deep Convolutional Neural Networks,” in *Proceedings of the 2015 ACM/SIGDA  
21 International Symposium on Field-Programmable Gate Arrays*, 2015, pp. 161–170.
- 22 [36] S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H. Yoo, “4.6 A1.93TOPS/W scalable deep  
23 learning/inference processor with tetra-parallel SIMD architecture for big-data applications,” in  
24 *2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers*,  
25 2015, pp. 1–3.
- 26 [37] N. P. Jouppi *et al.*, “In-Datacenter Performance Analysis of a Tensor Processing Unit,” in  
27 *Proceedings of the 44th Annual International Symposium on Computer Architecture*, 2017, pp. 1–  
28 12.
- 29 [38] B. Moons and M. Verhelst, “A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-  
30 scale ConvNets,” in *2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits)*, 2016, pp. 1–2.
- 31 [39] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-  
32 Time Object Detection,” *CoRR*, vol. abs/1506.0, 2015.
- 33 [40] S. Natarajan *et al.*, “A 14nm logic technology featuring 2nd-generation FinFET, air-gapped  
34 interconnects, self-aligned double patterning and a 0.0588  $\mu\text{m}^2$  SRAM cell size,” in *2014 IEEE  
35 International Electron Devices Meeting*, 2014, p. 3.7.1-3.7.3.
- 36 [41] S. Wu *et al.*, “A 7nm CMOS platform technology featuring 4th generation FinFET transistors with  
37 a 0.027um<sup>2</sup> high density 6-T SRAM cell for mobile SoC applications,” in *2016 IEEE  
38 International Electron Devices Meeting (IEDM)*, 2016, p. 2.6.1-2.6.4.
- 39 [42] “Nvidia Volta Architecture.” [Online]. Available: [http://images.nvidia.com/content/volta-  
41 architecture/pdf/volta-architecture-whitepaper.pdf](http://images.nvidia.com/content/volta-<br/>40 architecture/pdf/volta-architecture-whitepaper.pdf).
- 42 [43] C. Villa, D. Mills, G. Barkley, H. Giduturi, S. Schippers, and D. Vimercati, “A 45nm 1Gb 1.8V  
43 phase-change memory,” in *2010 IEEE International Solid-State Circuits Conference - (ISSCC)*,  
44 2010, pp. 270–271.
- 45 [44] Y. Choi *et al.*, “A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth,” in *2012 IEEE  
46 International Solid-State Circuits Conference*, 2012, pp. 46–48.
- 47 [45] R. Fackenthal *et al.*, “19.7 A 16Gb ReRAM with 200MB/s write and 1GB/s read in 27nm  
48 technology,” in *2014 IEEE International Solid-State Circuits Conference Digest of Technical  
49 Papers (ISSCC)*, 2014, pp. 338–339.
- 50 [46] T. Liu *et al.*, “A 130.7mm<sup>2</sup> 2-Layer 32-Gb ReRAM Memory Device in 24-nm Technology,” *IEEE  
51 J. Solid-State Circuits*, vol. 49, no. 1, pp. 140–153, 2014.
- 52 [47] W. Otsuka *et al.*, “A 4Mb conductive-bridge resistive memory with 2.3GB/s read-throughput and  
53 216MB/s program-throughput,” in *2011 IEEE International Solid-State Circuits Conference*,  
54 2011, pp. 210–211.
- 55  
56  
57  
58  
59  
60

- [48] K. Rho *et al.*, “23.5 A 4Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture,” in *2017 IEEE International Solid-State Circuits Conference (ISSCC)*, 2017, pp. 396–397.
- [49] S. Sidler *et al.*, “Large-scale neural networks implemented with Non-Volatile Memory as the synaptic weight element: Impact of conductance response,” in *2016 46th European Solid-State Device Research Conference (ESSDERC)*, 2016, pp. 440–443.
- [50] P. Chen, L. Gao, and S. Yu, “Design of Resistive Synaptic Array for Implementing On-Chip Sparse Learning,” *IEEE Trans. Multi-Scale Comput. Syst.*, vol. 2, no. 4, pp. 257–264, 2016.
- [51] P.-Y. Chen, Z. Li, and S. Yu, “Design Tradeoffs of Vertical RRAM-Based 3-D Cross-Point Array,” *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 24, no. 12, pp. 3460–3467, Dec. 2016.
- [52] R. Waser, “Resistive non-volatile memory devices (Invited Paper),” *Microelectron. Eng.*, vol. 86, no. 7, pp. 1925–1928, 2009.
- [53] R. Waser, R. Dittmann, G. Staikov, and K. Szot, “Redox-Based Resistive Switching Memories – Nanoionic Mechanisms, Prospects, and Challenges,” *Adv. Mater.*, vol. 21, no. 25-26, pp. 2632–2663, Jul. 2009.
- [54] R. Waser and M. Aono, “Nanoionics-based resistive switching memories,” *Nat. Mater.*, vol. 6, p. 833, Nov. 2007.
- [55] A. Sawa, “Resistive switching in transition metal oxides,” *Mater. Today*, vol. 11, no. 6, pp. 28–36, 2008.
- [56] H. Akinaga and H. Shima, “Resistive Random Access Memory (ReRAM) Based on Metal Oxides,” *Proc. IEEE*, vol. 98, no. 12, pp. 2237–2251, 2010.
- [57] H.-S. P. Wong *et al.*, “Metal–Oxide RRAM,” *Proc. IEEE*, vol. 100, no. 6, pp. 1951–1970, Jun. 2012.
- [58] D. S. Jeong *et al.*, “Emerging memories: resistive switching mechanisms and current status,” *Reports Prog. Phys.*, vol. 75, no. 7, p. 076502, Jul. 2012.
- [59] S.-Y. Wang, C.-W. Huang, D.-Y. Lee, T.-Y. Tseng, and T.-C. Chang, “Multilevel resistive switching in Ti/CuxO/Pt memory devices,” *J. Appl. Phys.*, vol. 108, no. 11, p. 114110, 2010.
- [60] C. Yoshida, K. Tsunoda, H. Noshiro, and Y. Sugiyama, “High speed resistive switching in Pt/TiO<sub>2</sub>/TiN film for nonvolatile memory application,” *Appl. Phys. Lett.*, vol. 91, no. 22, p. 223510, Nov. 2007.
- [61] H. Y. Lee *et al.*, “Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO<sub>2</sub> based RRAM,” in *2008 IEEE International Electron Devices Meeting*, 2008, pp. 1–4.
- [62] E.-K. Lai *et al.*, “Tungsten Oxide Resistive Memory Using Rapid Thermal Oxidation of Tungsten Plugs,” *Jpn. J. Appl. Phys.*, vol. 49, no. 4S, p. 04DD17, 2010.
- [63] M. Terai, Y. Sakotsubo, S. Kotsuji, and H. Hada, “Resistance Controllability of Ta<sub>2</sub>O<sub>5</sub>/TiO<sub>2</sub> Stack ReRAM for Low-Voltage and Multilevel Operation,” *IEEE Electron Device Lett.*, vol. 31, no. 3, pp. 204–206, 2010.
- [64] B.-G. Chae *et al.*, “Nanometer-Scale Phase Transformation Determines Threshold and Memory Switching Mechanism,” *Adv. Mater.*, vol. 29, no. 30, p. 1701752, Aug. 2017.
- [65] X. Zhao, H. Xu, Z. Wang, L. Zhang, J. Ma, and Y. Liu, “Nonvolatile/volatile behaviors and quantized conductance observed in resistive switching memory based on amorphous carbon,” *Carbon N. Y.*, vol. 91, pp. 38–44, Sep. 2015.
- [66] S. Yu *et al.*, “Binary neural network with 16 Mb RRAM macro chip for classification and online training,” in *Technical Digest - International Electron Devices Meeting, IEDM*, 2017.
- [67] M.-J. Lee *et al.*, “A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta<sub>2</sub>O<sub>5</sub>-x/TaO<sub>2</sub>-x bilayer structures,” *Nat. Mater.*, vol. 10, no. 8, pp. 625–630, Jul. 2011.
- [68] S. H. Misha *et al.*, “Effect of Nitrogen Doping on Variability of TaO<sub>x</sub> -RRAM for Low-Power 3-Bit MLC Applications,” *ECS Solid State Lett.*, vol. 4, no. 3, pp. P25–P28, Jan. 2015.
- [69] J. Woo *et al.*, “Improved Synaptic Behavior Under Identical Pulses Using AlO<sub>x</sub>/HfO<sub>2</sub> Bilayer

- 1  
2  
3 RRAM Array for Neuromorphic Systems," *IEEE Electron Device Lett.*, vol. 37, no. 8, pp. 994–  
4 997, 2016.  
5 [70] W. Wu, H. Wu, B. Gao, N. Deng, S. Yu, and H. Qian, "Improving Analog Switching in HfO<sub>x</sub>  
6 Based Resistive Memory with Thermal Enhanced Layer," *IEEE Electron Device Letters*, vol. 38,  
7 no. 8, 2017.  
8 [71] B. Govoreanu *et al.*, "Advanced a-VMCO resistive switching memory through inner interface  
9 engineering with wide (>10<sup>2</sup>) on/off window, tunable μA-range switching current and excellent  
10 variability," in *2016 IEEE Symposium on VLSI Technology*, 2016, pp. 1–2.  
11 [72] Y. Bai *et al.*, "Stacked 3D RRAM Array with Graphene/CNT as Edge Electrodes," *Sci. Rep.*, vol.  
12 5, p. 13785, Sep. 2015.  
13 [73] I.-T. Wang, Y.-C. Lin, Y.-F. Wang, C.-W. Hsu, and T.-H. Hou, "3D synaptic architecture with  
14 ultralow sub-10 fJ energy per spike for neuromorphic computation," in *2014 IEEE International  
15 Electron Devices Meeting*, 2014, p. 28.5.1-28.5.4.  
16 [74] J. Sohn, S. Lee, Z. Jiang, H. Chen, and H.-P. Wong, "Atomically thin graphene plane electrode  
17 for 3D RRAM," in *2014 IEEE International Electron Devices Meeting*, 2014, p. 5.3.1-5.3.4.  
18 [75] S. Park *et al.*, "A non-linear ReRAM cell with sub-1μA ultralow operating current for high density  
19 vertical resistive memory (VRRAM)," in *2012 International Electron Devices Meeting*, 2012, p.  
20 20.8.1-20.8.4.  
21 [76] M. Yu *et al.*, "Novel Vertical 3D Structure of TaOx-based RRAM with Self-localized Switching  
22 Region by Sidewall Electrode Oxidation," *Sci. Rep.*, vol. 6, no. 1, p. 21020, Aug. 2016.  
23 [77] H.-Y. Chen, S. Yu, B. Gao, P. Huang, J. Kang, and H.-S. P. Wong, "HfO<sub>x</sub> based vertical resistive  
24 random access memory for cost-effective 3D cross-point architecture without cell selector," in  
25 *2012 International Electron Devices Meeting*, 2012, p. 20.7.1-20.7.4.  
26 [78] H. Li *et al.*, "Hyperdimensional computing with 3D VRRAM in-memory kernels: Device-  
27 architecture co-design for energy-efficient, error-resilient language recognition," in *2016 IEEE  
28 International Electron Devices Meeting (IEDM)*, 2016, p. 16.1.1-16.1.4.  
29 [79] S. Park *et al.*, "Electronic system with memristive synapses for pattern recognition," *Sci. Rep.*, vol.  
30 5, p. 10123, May 2015.  
31 [80] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov,  
32 "Training and operation of an integrated neuromorphic network based on metal-oxide  
33 memristors," *Nature*, vol. 521, no. 7550, pp. 61–64, May 2015.  
34 [81] L. Gao, P. Chen, and S. Yu, "Demonstration of Convolution Kernel Operation on Resistive Cross-  
35 Point Array," *IEEE Electron Device Lett.*, vol. 37, no. 7, pp. 870–873, 2016.  
36 [82] P. Yao *et al.*, "Face classification using electronic synapses," *Nat. Commun.*, vol. 8, p. 15199, May  
37 2017.  
38 [83] Y. Hirose and H. Hirose, "Polarity-dependent memory switching and behavior of Ag dendrite in  
39 Ag-photodoped amorphous As<sub>2</sub>S<sub>3</sub> films," *J. Appl. Phys.*, vol. 47, no. 6, pp. 2767–2772, 1976.  
40 [84] S. Z. Rahaman *et al.*, "Enhanced nanoscale resistive switching memory characteristics and  
41 switching mechanism using high-Ge-content Ge0.5Se0.5 solid electrolyte," *Nanoscale Res. Lett.*,  
42 vol. 7, no. 1, p. 614, Nov. 2012.  
43 [85] E. Vianello *et al.*, "Sb-doped GeS<sub>2</sub> as performance and reliability booster in Conductive Bridge  
44 RAM," in *2012 International Electron Devices Meeting*, 2012, p. 31.5.1-31.5.4.  
45 [86] S. Choi, J. Lee, H. Bae, W. Yang, T. Kim, and K. Kim, "Improvement of CBRAM Resistance  
46 Window by Scaling Down Electrode Size in Pure-GeTe Film," *IEEE Electron Device Lett.*, vol.  
47 30, no. 2, pp. 120–122, 2009.  
48 [87] C. Schindler, S. C. P. Thermadam, R. Waser, and M. N. Kozicki, "Bipolar and Unipolar Resistive  
49 Switching in Cu-Doped SiO<sub>2</sub>," *IEEE Trans. Electron Devices*, vol. 54, no. 10, pp. 2762–2768,  
50 2007.  
51 [88] Y. Li *et al.*, "Resistive Switching Properties of Au/ZrO<sub>2</sub>/Ag Structure for Low-Voltage  
52 Nonvolatile Memory Applications," *IEEE Electron Device Lett.*, vol. 31, no. 2, pp. 117–119,  
53 2010.  
54  
55  
56  
57  
58  
59  
60

- [89] N. Banno *et al.*, “Diffusivity of Cu Ions in Solid Electrolyte and Its Effect on the Performance of Nanometer-Scale Switch,” *IEEE Trans. Electron Devices*, vol. 55, no. 11, pp. 3283–3287, 2008.
- [90] S. Z. Rahaman *et al.*, “Repeatable unipolar/bipolar resistive memory characteristics and switching mechanism using a Cu nanofilament in a GeO<sub>x</sub> film,” *Appl. Phys. Lett.*, vol. 101, no. 7, p. 73106, 2012.
- [91] M. Tada *et al.*, “Highly scalable nonvolatile TiO<sub>x</sub>/TaSiO<sub>y</sub> solid-electrolyte crossbar switch integrated in local interconnect for low power reconfigurable logic,” in *2009 IEEE International Electron Devices Meeting (IEDM)*, 2009, pp. 1–4.
- [92] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale Memristor Device as Synapse in Neuromorphic Systems,” *Nano Lett.*, vol. 10, no. 4, pp. 1297–1301, 2010.
- [93] I. Valov, R. Waser, J. R. Jameson, and M. N. Kozicki, “Electrochemical metallization memories—fundamentals, applications, prospects,” *Nanotechnology*, vol. 22, no. 25, p. 254003, 2011.
- [94] D. Jana *et al.*, “Conductive-bridging random access memory: challenges and opportunity for 3D architecture,” *Nanoscale Res. Lett.*, vol. 10, no. 1, p. 188, Dec. 2015.
- [95] J. Yoon *et al.*, “Excellent Switching Uniformity of Cu-Doped MoO<sub>x</sub>/GdO<sub>x</sub> Bilayer for Nonvolatile Memory Applications,” *IEEE Electron Device Lett.*, vol. 30, no. 5, pp. 457–459, 2009.
- [96] S. Z. Rahaman *et al.*, “Excellent resistive memory characteristics and switching mechanism using a Ti nanolayer at the Cu/TaO<sub>x</sub> interface,” *Nanoscale Res. Lett.*, vol. 7, no. 1, p. 345, Jun. 2012.
- [97] S. Z. Rahaman *et al.*, “Impact of TaO<sub>x</sub> nanolayer at the GeSex/W interface on resistive switching memory performance and investigation of Cu nanofilament,” *J. Appl. Phys.*, vol. 111, no. 6, p. 63710, 2012.
- [98] L. Goux *et al.*, “Influence of the Cu-Te composition and microstructure on the resistive switching of Cu-Te/Al<sub>2</sub>O<sub>3</sub>/Si cells,” *Appl. Phys. Lett.*, vol. 99, no. 5, p. 53502, 2011.
- [99] A. Belmonte *et al.*, “90nm W\Al<sub>2</sub>O<sub>3\right\rangle TiW\Cu 1T1R CBRAM cell showing low-power, fast and disturb-free operation,” in *2013 5th IEEE International Memory Workshop*, 2013, pp. 26–29.</sub>
- [100] K. Aratani *et al.*, “A Novel Resistance Memory with High Scalability and Nanosecond Switching,” in *2007 IEEE International Electron Devices Meeting*, 2007, pp. 783–786.
- [101] S. Fujii *et al.*, “Scaling the CBRAM Switching Layer Diameter to 30 nm Improves Cycling Endurance,” *IEEE Electron Device Lett.*, vol. 39, no. 1, pp. 23–26, Jan. 2018.
- [102] S. Yu and H.-P. Wong, “Modeling the switching dynamics of programmable-metallization-cell (PMC) memory and its application as synapse device for a neuromorphic computation system,” in *2010 International Electron Devices Meeting*, 2010, p. 22.1.1-22.1.4.
- [103] E. O. Neftci, B. U. Pedroni, S. Joshi, M. Al-Shedivat, and G. Cauwenberghs, “Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines,” *Front. Neurosci.*, vol. 10, p. 241, 2016.
- [104] J. H. Lee and K. K. Likharev, “Defect-tolerant nanoelectronic pattern classifiers,” *Int. J. Circuit Theory Appl.*, vol. 35, no. 3, pp. 239–264, May 2007.
- [105] M. Suri *et al.*, “Bio-Inspired Stochastic Computing Using Binary CBRAM Synapses,” *IEEE Trans. Electron Devices*, vol. 60, no. 7, pp. 2402–2409, 2013.
- [106] C. Merkel, D. Kudithipudi, M. Suri, and B. Wysocki, “Stochastic CBRAM-Based Neuromorphic Time Series Prediction System,” *J. Emerg. Technol. Comput. Syst.*, vol. 13, no. 3, p. 37:1–37:14, Feb. 2017.
- [107] J. H. Yoon *et al.*, “Truly Electroforming-Free and Low-Energy Memristors with Preconditioned Conductive Tunneling Paths,” *Adv. Funct. Mater.*, vol. 27, no. 35, p. 1702010, Sep. 2017.
- [108] Y. Shi, S. Fong, H.-S. P. Wong, and D. Kuzum, “Synaptic Devices Based on Phase-Change Memory,” in *Neuro-inspired Computing Using Resistive Synaptic Devices*, S. Yu, Ed. Cham: Springer International Publishing, 2017, pp. 19–51.
- [109] D. Kuzum, R. G. D. Jeyasingh, B. Lee, and H. S. P. Wong, “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” *Nano Lett.*, 2012.
- [110] C. D. Wright, Y. Liu, K. I. Kohary, M. M. Aziz, and R. J. Hicken, “Arithmetic and Biologically-Inspired Computing Using Phase-Change Materials,” *Adv. Mater.*, vol. 23, no. 30, pp. 3408–3413,

1  
2  
3 Jun. 2011.  
4  
5

- 6 [111] M. Suri *et al.*, "Phase change memory as synapse for ultra-dense neuromorphic systems:  
7 Application to complex visual pattern extraction," in *2011 International Electron Devices  
8 Meeting*, 2011, p. 4.4.1-4.4.4.  
9 [112] M. Suri *et al.*, "Interface Engineering of PCM for Improved Synaptic Performance in  
10 Neuromorphic Systems," in *2012 4th IEEE International Memory Workshop*, 2012, pp. 1–4.  
11 [113] G. W. Burr *et al.*, "Experimental demonstration and tolerancing of a large-scale neural network  
12 (165,000 synapses), using phase-change memory as the synaptic weight element," in *2014 IEEE  
13 International Electron Devices Meeting*, 2014, p. 29.5.1-29.5.4.  
14 [114] S. B. Eryilmaz *et al.*, "Experimental demonstration of array-level learning with phase change  
15 synaptic devices," in *2013 IEEE International Electron Devices Meeting*, 2013, p. 25.5.1-25.5.4.  
16 [115] S. Kim *et al.*, "NVM neuromorphic core with 64k-cell (256-by-256) phase change memory  
17 synaptic array with on-chip neuron circuits for continuous in-situ learning," in *2015 IEEE  
18 International Electron Devices Meeting (IEDM)*, 2015, p. 17.1.1-17.1.4.  
19 [116] S. Ambrogio *et al.*, "Unsupervised Learning by Spike Timing Dependent Plasticity in Phase  
20 Change Memory (PCM) Synapses," *Front. Neurosci.*, vol. 10, p. 56, 2016.  
21 [117] A. Sebastian *et al.*, "Temporal correlation detection using computational phase-change memory,"  
22 *Nat. Commun.*, vol. 8, no. 1, p. 1115, Dec. 2017.  
23 [118] B. Gao, H. Wu, J. Kang, H. Yu, and H. Qian, "Oxide-based analog synapse: Physical modeling,  
24 experimental characterization, and optimization," in *2016 IEEE International Electron Devices  
25 Meeting (IEDM)*, 2016, p. 7.3.1-7.3.4.  
26 [119] B. Gao *et al.*, "Modeling disorder effect of the oxygen vacancy distribution in filamentary analog  
27 RRAM for neuromorphic computing," in *2017 IEEE International Electron Devices Meeting  
28 (IEDM)*, 2017, p. 4.4.1-4.4.4.  
29 [120] S. Park *et al.*, "Neuromorphic speech systems using advanced ReRAM-based synapse," in *2013  
30 IEEE International Electron Devices Meeting*, 2013, p. 25.6.1-25.6.4.  
31 [121] C. Peng *et al.*, "W-Sb-Te phase-change material: A candidate for the trade-off between  
32 programming speed and data retention," *Appl. Phys. Lett.*, vol. 101, no. 12, p. 122108, Sep. 2012.  
33 [122] D. Loke *et al.*, "Breaking the speed limits of phase-change memory.," *Science*, vol. 336, no. 6088,  
34 pp. 1566–9, Jun. 2012.  
35 [123] S. Muraoka, T. Ninomiya, Z. Wei, K. Katayama, R. Yasuhara, and T. Takagi, "Comprehensive  
36 understanding of conductive filament characteristics and retention properties for highly reliable  
37 ReRAM," in *2013 Symposium on VLSI Technology*, 2013, pp. T62–T63.  
38 [124] M. Suri *et al.*, "CBRAM devices as binary synapses for low-power stochastic neuromorphic  
39 systems: Auditory (Cochlea) and visual (Retina) cognitive processing applications," in *2012  
40 International Electron Devices Meeting*, 2012, p. 10.3.1-10.3.4.  
41 [125] T. Ninomiya *et al.*, "Conductive filament scaling of TaOx bipolar ReRAM for long retention with  
42 low current operation." .  
43 [126] H. Wu *et al.*, "Resistive Random Access Memory for Future Information Processing System,"  
44 *Proc. IEEE*, vol. 105, no. 9, pp. 1770–1789, 2017.  
45 [127] Y. Y. Chen *et al.*, "Endurance/Retention Trade-off on HfO<sub>2</sub>/Metal Cap 1T1R Bipolar RRAM,"  
46 *IEEE Trans. Electron Devices*, vol. 60, no. 3, pp. 1114–1121, 2013.  
47 [128] E. Wu *et al.*, "Fundamental limitations of existing models and future solutions for dielectric  
48 reliability and RRAM applications (invited)," in *2017 IEEE International Electron Devices  
49 Meeting (IEDM)*, 2017, p. 21.5.1-21.5.4.  
50 [129] B. Gao *et al.*, "Modeling of Retention Failure Behavior in Bipolar Oxide-Based Resistive  
51 Switching Memory," *IEEE Electron Device Lett.*, vol. 32, no. 3, pp. 276–278, 2011.  
52 [130] H. Chen *et al.*, "Towards high-speed, write-disturb tolerant 3D vertical RRAM arrays," in *2014  
53 Symposium on VLSI Technology (VLSI-Technology): Digest of Technical Papers*, 2014, pp. 1–2.  
54 [131] D. Ielmini, F. Nardi, and C. Cagli, "Resistance-dependent amplitude of random telegraph-signal  
55 noise in resistive switching memories," *Appl. Phys. Lett.*, vol. 96, no. 5, p. 53503, 2010.  
56  
57  
58  
59  
60

- [132] J. Kang *et al.*, "Time-dependent variability in RRAM-based analog neuromorphic system for pattern recognition," in *2017 IEEE International Electron Devices Meeting (IEDM)*, 2017, p. 6.4.1-6.4.4.
- [133] P. Chen, X. Peng, and S. Yu, "NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures," in *2017 IEEE International Electron Devices Meeting (IEDM)*, 2017, p. 6.1.1-6.1.4.
- [134] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," *Proc. IEEE*, vol. 86, no. 11, pp. 2278–2324, 1998.
- [135] L. Gao *et al.*, "Fully parallel write/read in resistive synaptic array for accelerating on-chip learning," *Nanotechnology*, vol. 26, no. 45, p. 455204, 2015.
- [136] L. Gao, P. Chen, and S. Yu, "Programming Protocol Optimization for Analog Weight Tuning in Resistive Memories," *IEEE Electron Device Lett.*, vol. 36, no. 11, pp. 1157–1159, 2015.
- [137] W. Zhang *et al.*, "An Electronic Synapse Device Based on Solid Electrolyte Resistive Random Access Memory," *IEEE Electron Device Lett.*, vol. 36, no. 8, pp. 772–774, 2015.
- [138] Z. Wang *et al.*, "Engineering incremental resistive switching in TaOx based memristors for brain-inspired computing," *Nanoscale*, vol. 8, no. 29, pp. 14015–14022, 2016.
- [139] S. Yu, Y. Wu, R. Jeyasingh, D. Kuzum, and H.-S. P. Wong, "An Electronic Synapse Device Based on Metal Oxide Resistive Switching Memory for Neuromorphic Computation," *IEEE Trans. Electron Devices*, vol. 58, no. 8, pp. 2729–2737, Aug. 2011.
- [140] M.-J. Lee *et al.*, "A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta<sub>2</sub>O<sub>5</sub>-x/TaO<sub>2</sub>-x bilayer structures," *Nat. Mater.*, vol. 10, p. 625, Jul. 2011.
- [141] F.-Y. Yuan *et al.*, "Conduction Mechanism and Improved Endurance in HfO<sub>2</sub>-Based RRAM with Nitridation Treatment," *Nanoscale Res. Lett.*, vol. 12, no. 1, p. 574, Dec. 2017.
- [142] B. Govoreanu *et al.*, "A-VMCO: A novel forming-free, self-rectifying, analog memory cell with low-current operation, nonfilamentary switching and excellent variability," in *2015 Symposium on VLSI Technology (VLSI Technology)*, 2015, pp. T132–T133.
- [143] R. Raina, A. Madhavan, and A. Y. Ng, "Large-scale Deep Unsupervised Learning Using Graphics Processors," in *Proceedings of the 26th Annual International Conference on Machine Learning*, 2009, pp. 873–880.
- [144] S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, "The SpiNNaker Project," *Proc. IEEE*, vol. 102, no. 5, pp. 652–665, 2014.
- [145] F. Akopyan *et al.*, "TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip," *IEEE Trans. Comput. Des. Integr. Circuits Syst.*, vol. 34, no. 10, pp. 1537–1557, 2015.
- [146] Q. V Le *et al.*, "Building High-level Features Using Large Scale Unsupervised Learning," in *Proceedings of the 29th International Conference on International Conference on Machine Learning*, 2012, pp. 507–514.
- [147] O. Bichler, M. Suri, D. Querlioz, D. Vuillaume, B. DeSalvo, and C. Gamrat, "Visual Pattern Extraction Using Energy-Efficient '2-PCM Synapse' Neuromorphic Architecture," *IEEE Trans. Electron Devices*, vol. 59, no. 8, pp. 2206–2214, 2012.
- [148] M. Djurfeldt, M. Lundqvist, C. Johansson, M. Rehn, O. Ekeberg, and A. Lansner, "Brain-scale simulation of the neocortex on the IBM Blue Gene/L supercomputer," *IBM J. Res. Dev.*, vol. 52, no. 1.2, pp. 31–41, 2008.
- [149] S. B. Eryilmaz *et al.*, "Training a Probabilistic Graphical Model With Resistive Switching Electronic Synapses," *IEEE Trans. Electron Devices*, vol. 63, no. 12, pp. 5004–5011, Dec. 2016.
- [150] D. George and J. Hawkins, "Towards a Mathematical Theory of Cortical Micro-circuits," *PLOS Comput. Biol.*, vol. 5, no. 10, pp. 1–26, 2009.
- [151] J. Schemmel, J. Fieres, and K. Meier, "Wafer-scale integration of analog neural networks," in *2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)*, 2008, pp. 431–438.
- [152] B. V Benjamin *et al.*, "Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale

- 1  
2  
3 Neural Simulations," *Proc. IEEE*, vol. 102, no. 5, pp. 699–716, 2014.  
4 [153] P.-Y. Chen *et al.*, "Technology-design co-optimization of resistive cross-point array for  
5 accelerating learning algorithms on chip," in *Design, Automation Test in Europe Conference  
6 Exhibition (DATE), 2015*, 2015, pp. 854–859.  
7 [154] S. Yu, B. Lee, and H.-S. P. Wong, "Metal Oxide Resistive Switching Memory," in *Functional  
8 Metal Oxide Nanostructures*, J. Wu, J. Cao, W.-Q. Han, A. Janotti, and H.-C. Kim, Eds. New  
9 York, NY: Springer New York, 2012, pp. 303–335.  
10 [155] C.-S. Yang, D.-S. Shang, Y.-S. Chai, L.-Q. Yan, B.-G. Shen, and Y. Sun, "Electrochemical-  
11 reaction-induced synaptic plasticity in MoOx-based solid state electrochemical cells," *Phys. Chem.  
12 Chem. Phys.*, vol. 19, no. 6, pp. 4190–4198, 2017.  
13 [156] J. Jang, S. Park, G. W. Burr, H. Hwang, and Y. Jeong, "Optimization of Conductance Change in  
14 Pr<sub>1-x</sub>CaxMnO<sub>3</sub> -Based Synaptic Devices for Neuromorphic Systems," *IEEE Electron Device  
15 Lett.*, vol. 36, no. 5, pp. 457–459, 2015.  
16 [157] S. Kim, H. Kim, S. Hwang, M.-H. Kim, Y.-F. Chang, and B.-G. Park, "Analog Synaptic Behavior  
17 of a Silicon Nitride Memristor," *ACS Appl. Mater. Interfaces*, vol. 9, no. 46, pp. 40420–40427,  
18 2017.  
19 [158] J. Woo, A. Padovani, K. Moon, M. Kwak, L. Larcher, and H. Hwang, "Linking Conductive  
20 Filament Properties and Evolution to Synaptic Behavior of RRAM Devices for Neuromorphic  
21 Applications," *IEEE Electron Device Lett.*, vol. 38, no. 9, pp. 1220–1223, Sep. 2017.  
22 [159] C. Chang *et al.*, "Mitigating Asymmetric Nonlinear Weight Update Effects in Hardware Neural  
23 Network Based on Analog Resistive Synapse," *IEEE J. Emerg. Sel. Top. Circuits Syst.*, vol. 8, no.  
24 1, pp. 116–124, 2018.  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60

Table 1 Comparison Between Different Reported RRAM and CBRAM Devices w.r.t the Key Device Parameters

|       | Material Stack                                               | Switching Voltage (V) (SET/RESET) | Switching Levels ( $\Omega$ ) (LRS/HRS) | On/Off Ratio    | Speed (ns)        | Retention                | Endurance         | Ref   |
|-------|--------------------------------------------------------------|-----------------------------------|-----------------------------------------|-----------------|-------------------|--------------------------|-------------------|-------|
| RRAM  | TiN/TiO <sub>x</sub> /HfO <sub>x</sub> /TiN                  | +0.8/-0.8                         | 1k/1M                                   | 10 <sup>3</sup> | 5                 | 10 yr                    | 10 <sup>5</sup>   | [61]  |
|       | Pt/Ta <sub>2</sub> O <sub>5-x</sub> /TaO <sub>2-x</sub> /Pt  | -1/+2                             | 30k/NR                                  | NR              | 10                | 10 yr @85°C              | 10 <sup>12</sup>  | [140] |
|       | Ta/Ta <sub>2</sub> O <sub>5</sub> :Ag/Ru                     | +0.7/-0.7                         | 100k/10M                                | 10 <sup>2</sup> | 100               | 115 days @RT             | 5x10 <sup>7</sup> | [107] |
|       | TiN/ N:HfO <sub>2</sub> /Pt                                  | +1/-1                             | 1k/10k                                  | 10              | 900               | 10 <sup>4</sup> s@ 85 °C | 10 <sup>9</sup>   | [141] |
|       | TiN/TiO <sub>2</sub> /a-Si/TiN                               | +7/-7                             | < 1 $\mu$ A current for 30 nm device    | NR              | 10                | 3 yr @55°C               | 10 <sup>6</sup>   | [142] |
|       | Pt/Ti/TiO <sub>2-x</sub> /Al <sub>2</sub> O <sub>3</sub> /Pt | -2/+2                             | 10k/100k                                | 10              | 5x10 <sup>5</sup> | 10 yr                    | 5x10 <sup>3</sup> | [80]  |
|       | TiN/HfO <sub>x</sub> /AlO <sub>x</sub> /Pt                   | +1.4→+1.8/-2.2→-2.6               | 10k/1M                                  | 10 <sup>2</sup> | 50                | 7200 s                   | 10 <sup>5</sup>   | [69]  |
| CBRAM | Pt/GeSO/TiN                                                  | +0.7/-1.1                         | 200/500                                 | 2.5             | 100               | NR                       | 2x10 <sup>3</sup> | [137] |
|       | TE/Cu-Te/GdO <sub>x</sub> /BE                                | +3/-1.7                           | 10k/10M                                 | 10 <sup>3</sup> | 5                 | 10 <sup>3</sup> s        | 10 <sup>7</sup>   | [100] |
|       | TE/Ag+Si/Si/BE                                               | 3.2/-2.8                          | 25M/200M                                | 8               | 3x10 <sup>5</sup> | NR                       | 10 <sup>7</sup>   | [92]  |
|       | Cu/SiO <sub>2</sub> /Pt                                      | +1/-0.5                           | 500M/5G                                 | 10              | NR                | NR                       | 10 <sup>4</sup>   | [101] |
|       | Cu/Ta <sub>2</sub> O <sub>5</sub> /Pt                        | +3.5/-2.5                         | 100/100M                                | 10 <sup>6</sup> | 10 <sup>4</sup>   | NR                       | 10 <sup>4</sup>   | [89]  |
|       | Cu/TiW/Al <sub>2</sub> O <sub>3</sub> /W                     | +1/-1                             | 100k/100M                               | 10 <sup>3</sup> | 10                | 600s@ 125°C              | 10 <sup>6</sup>   | [99]  |



Figure 1 Neuromorphic computing paradigm. In each box, already implemented examples are given along with the non-volatile memory device technology utilized (PCM = Phase Change Memory, RRAM = Resistive RAM, CBRAM = Conductive Bridging RAM). The region highlighted in yellow is the topic highlighted in this paper. Ref: [80], [113], [114], [143]–[153].



Figure 2 (a) Single layer perceptron with 4 inputs and two outputs. (b) General computational form for single layer ANN. (c) Non-volatile memory crossbar array for realizing the matrix-vector multiplication shown in (b), here, T = Transistor, S = Selector, R = Resistor.



Figure 3 Schematic illustration of the switching process in the simple binary metal-oxide RRAM. This figure is adapted from [154].



Figure 4 (a,b,c) The schematic representation to describe the denser controlled filament formation by nitrogen incorporation, (d) Analysis of the effect of nitrogen doping on the device of different nitrogen amount with the function of non-linearity and variability in  $30 \mu\text{A}$  compliance current to set up the guideline for 3-bit MLC storage feasibility of  $N\text{-TaO}_x$  based RRAM device.

This figure is adapted from [68].



Figure 5 (a) SET and RESET characteristics of the  $\text{HfO}_2$  IT-IR array with identical pulses. The conductance change as a function of the number of set/reset pulses is shown. Increment and decrement of the conductance was determined by either a higher voltage or a longer PW, (b) Schematic illustration of an analog switching behavior in the  $\text{AlO}_x/\text{HfO}_2$  RRAM, (c) Comparison of the SET/RESET switching obtained from the  $\text{HfO}_2$  and  $\text{AlO}_x/\text{HfO}_2$  RRAM devices. In the  $\text{AlO}_x/\text{HfO}_2$  device potentiation and depression behavior is obtained by applying identical pulses with  $0.9\text{ V}$  and  $1\text{ V}$  of  $100\mu\text{s}$  PW, respectively. This figure is adapted from [69].



Figure 6 (a) Typical DC-IV of  $\text{HfO}_x/\text{TEL}$  RRAM at room temperature. Analog switching is improved due to TEL layer, (b) Conductance of  $\text{HfO}_x/\text{TEL}$  RRAM changes with number of identical SET pulses at RT, (c) Conductance of  $\text{HfO}_x/\text{TEL}$  RRAM changes with number of identical RESET pulses at RT, (d) Average conductance change during SET and RESET of 256  $\text{HfO}_x/\text{TEL}$  RRAM devices in the array. This figure is adapted from [62].



Figure 7 Schematic drawings of (a) 3D X-point ReRAM, (b) Vertical ReRAM. This figure is adapted from [78].



Figure 8 Cycling endurance of the devices with switching layer diameter of 100 nm, 50 nm, and 30 nm. Endurance is improved to 104 with scaling down the switching layer area to 30 nm. This figure is adapted from [101].



Figure 9 Schematic diagrams illustrating (a) the distributions of Ag ions in the pristine state (equivalent to the HRS) and (b) the LRS (caused by the migration of Ag) in Ta/Ta<sub>2</sub>O<sub>5</sub>:Ag/Ru device. Schematic diagrams showing (c) the pristine state and (d) the forming process (oxygen vacancy mediated VCM) in a Ta/Ta<sub>2</sub>O<sub>5</sub>:Ag/Pd device. This figure is adapted from [107].



Figure 10 Circuit schematic of the "2-PCM synapse". This figure is adapted from [111].



Figure 11 (a) Schematic illustration of the neuromorphic network with a ITIR synapse. The PRE drives the MOS transistor gate voltage  $V$ , thus activating a current spike due to the low negative TE voltage ( $V_{TE} = 30 \text{ mV}$ ) set by the POST. The current spikes are fed into the POST, which eventually delivers a  $V$  spike back to the synapse as the internal voltage  $V$  exceeds a threshold  $V$ . The  $V$  spike includes a set and reset pulse to induce potentiation/depression according to the STDP protocol. (b) (small positive delay), (c) (small negative delay) Scheme of the applied pulses from the PRE and POST neurons to the ITIR synapse. This figure is adapted from [116].



Figure 12 Comparison of RRAM, CBRAM and PCM Technology



Figure 13 Simulated retention behavior of non-filamentary RRAM devices with various migration barrier of oxygen vacancy.  
Simulation parameters can be found in Ref. [118].



Figure 14 Simulated potentiation process of non-filamentary RRAM devices with various migration barrier of oxygen vacancy. The programming voltage is fixed as 2V and pulse number is fixed as 100. To program to a larger ratio, longer pulse width is required.



Figure 15 (a) Simulated retention behavior of filamentary RRAM devices with single strong CF. 100 devices from the same original state are simulated and shown. Baking temperature is 85 °C. Simulation details and other parameters can be found in Ref. [9]. Inset: current density distribution of the RRAM device. Its order parameter is 0.67. (b) Simulated retention behavior of filamentary RRAM devices with multiple weak CF. Other situation is the same to (a). Inset is the current density, and its order parameter is 0.46. Retention becomes worse in this case.



Figure 16 (a) The 2-layer multilayer perceptron (MLP) neural network (NN). The input MNIST images are cropped and encoded into black/white data for simplification. (b) In the MLP simulator, the weights  $W_{IH}$  and  $W_{HO}$  are implemented with synaptic arrays, where each synaptic device exhibits non-ideal device properties.



Figure 17 Schematic illustration of non-ideal synaptic device properties modeled in the MLP simulator, including (1) nonlinear weight update, (2) weight precision, (3) device-to-device weight update variation, (4) cycle-to-cycle weight update variation, (5) dynamic range (conductance ON/OFF ratio) and (6) conductance variation.



Figure 18 The impact of (a) weight precision, (b) conductance ON/OFF ratio, (c) weight update asymmetry/nonlinearity, (d) conductance range variation, (e) device-to-device variation and (f) cycle-to-cycle variation in online learning and/or offline classification [133].



### Device Stack

1. Ta/TaO<sub>x</sub>/TiO<sub>2</sub>/Ti
2. Pt/Ti/TiO<sub>2-x</sub>/Al<sub>2</sub>O<sub>3</sub>/Pt
3. Pt/Cr/a-Si:Ag/a-Si/W
4. Al/HfO<sub>2</sub>/Ti/TiN
5. Ni/SiN<sub>x</sub>/AlO<sub>x</sub>/TiN
6. Pt/GeSO/TiN
7. TiN/TaO<sub>x</sub>/Pt
8. TiN/SiO<sub>x</sub> (1nm)/TaO<sub>x</sub>/Pt
9. TiN/HfO<sub>x</sub>/AlO<sub>x</sub>/Pt
10. Ag/MoO<sub>x</sub>/FTO
11. Ta/HfO<sub>2</sub>/Al Doped TiO<sub>2</sub>/TiN
12. TiN/TaO<sub>x</sub>/HfAl<sub>y</sub>O<sub>x</sub>/TiN
13. TiN/TiO<sub>x</sub>/Ta<sub>2</sub>O<sub>5</sub>/TiN
14. TiN/TEL/HfO<sub>x</sub>/TiN
15. TiN/PCMO/Pt

Figure 19 (a) Pulse width vs pulse amplitude for gradual conductance switching for RRAMs and CBRAMs demonstrated in literature. (b) Conductance range of the aforementioned devices during the switching as a function of the pulse amplitude. The device data are taken from - [69], [70], [80], [82], [92], [135], [137]–[139], [155]–[159]