

# Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis

İsmail Emir Yüksel Yahya Can Tuğrul Ataberk Olgun F. Nisa Bostancı A. Giray Yağlıkçı  
Geraldo F. Oliveira Haocong Luo Juan Gómez-Luna Mohammad Sadrosadati Onur Mutlu

ETH Zürich

*Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to significantly reduce or eliminate costly data movement between processing elements and main memory. A common approach for PuD architectures is to make use of bulk bitwise computation (e.g., AND, OR, NOT). Prior works experimentally demonstrate three-input MAJ (i.e., MAJ3) and two-input AND and OR operations in commercial off-the-shelf (COTS) DRAM chips. Yet, demonstrations on COTS DRAM chips do not provide a functionally complete set of operations (e.g., NAND or AND and NOT).*

*We experimentally demonstrate that COTS DRAM chips are capable of performing 1) functionally-complete Boolean operations: NOT, NAND, and NOR and 2) many-input (i.e., more than two-input) AND and OR operations. We present an extensive characterization of new bulk bitwise operations in 256 off-the-shelf modern DDR4 DRAM chips. We evaluate the reliability of these operations using a metric called success rate: the fraction of correctly performed bitwise operations. Among our 19 new observations, we highlight four major results. First, we can perform the NOT operation on COTS DRAM chips with a 98.37% success rate on average. Second, we can perform up to 16-input NAND, NOR, AND, and OR operations on COTS DRAM chips with high reliability (e.g., 16-input NAND, NOR, AND, and OR with an average success rate of 94.94%, 95.87%, 94.94%, and 95.85%, respectively). Third, data pattern only slightly affects NAND, NOR, AND, and OR operations. Our results show that executing NAND, NOR, AND, and OR operations with random data patterns decreases the success rate compared to all logic-1/logic-0 patterns by 1.39%, 1.97%, 1.43%, and 1.98%, respectively. Fourth, NOT, NAND, NOR, AND, and OR operations are highly resilient to temperature changes, with small success rate fluctuations of at most 1.66% among all the tested operations when the temperature is increased from 50°C to 95°C. We believe these empirical results demonstrate the promising potential of using DRAM as a computation substrate. To aid future research and development, we open-source our infrastructure at <https://github.com/CMU-SAFARI/FCDRAM>.*

## 1. Introduction

Modern systems are processor-centric [1, 2]: they require frequent data movement between processing elements (e.g., CPU, GPU, TPU, and FPGA) and main memory (DRAM), leading to significant inefficiencies in performance and energy consumption [1–28]. Data movement from/to main memory is an increas-

ingly significant bottleneck across a wide variety of computing systems and applications [12, 13]. Processing-using-DRAM (PuD) [29–32] is a promising paradigm that can alleviate the data movement bottleneck. PuD uses the analog operational properties of the DRAM circuitry to enable massively parallel in-DRAM computation. Many prior works [29–53] demonstrate that PuD can greatly reduce or eliminate data movement.

A widely used approach for PuD is to perform bulk bitwise operations, i.e., bitwise operations on large bit vectors. To perform bulk bitwise operations using DRAM, prior works propose modifications to the DRAM circuitry [29–31, 33, 35, 36, 43, 44, 46, 48–58]. Recent works [38, 41, 42, 45] experimentally demonstrate the feasibility of executing data copy & initialization [42, 45], i.e., the RowClone operation [49], and a subset of bitwise operations, i.e., three-input bitwise majority (MAJ3) and two-input AND and OR operations in unmodified commercial off-the-shelf (COTS) DRAM chips by operating beyond manufacturer-recommended DRAM timing parameters. To do so, these works [38, 41, 42, 45] issue carefully engineered sequences of DRAM commands that allow sequentially or simultaneously activating multiple DRAM rows. By simultaneously activating multiple DRAM rows, COTS DRAM chips can perform the MAJ3 operation in a bulk bitwise manner among activated DRAM rows, which can be used to implement two-input AND and OR operations.

While the demonstration of MAJ3 and two-input AND and OR operations in COTS DRAM is clearly interesting and shows the potential of investigating PuD as a serious computation substrate, prior works [38, 41, 45] do *not* provide the demonstration of 1) a functionally-complete set of operations (e.g., AND and NOT or NAND/NOR) or 2) AND and OR operations with more than two inputs (e.g., a 16-input AND operation).

In this paper, we experimentally demonstrate that COTS DDR4 DRAM chips are capable of performing 1) NOT, NAND, and NOR operations and 2) many-input (i.e., up to 16-input) NAND, NOR, AND, and OR operations. Doing so allows a COTS DDR4 chip to provide a *functionally-complete* set of many-input Boolean operations. We present an extensive characterization of the *new* bulk bitwise operations (i.e., NOT, NAND, and NOR) and many-input NAND, NOR, AND, and OR operations in 256 COTS DDR4 chips. We evaluate the reliability of these Boolean operations using numerous data patterns and at different DRAM chip temperatures. To quantitatively evaluate the reliability of a Boolean operation, we measure its *success rate* per DRAM cell as the fraction of correctly performed bitwise operations over 10000 trials.

Based on our characterization of NOT and many-input NAND, NOR, AND, and OR operations, we make 19 new empirical observations and share five key takeaway lessons from our observations. We highlight four of our major new results. First, off-the-shelf DRAM chips are capable of performing the NOT operation. We observe that we can perform bulk bitwise NOT on COTS DRAM chips with an average success rate of 98.37%.<sup>1</sup> Second, we can perform NAND, NOR, AND, and OR operations on COTS DRAM chips with {2, 4, 8, 16} inputs. Our results show that we can execute 16-input NAND, NOR, AND, and OR operations with an average success rate of 94.94%, 95.87%, 94.94%, and 95.85%, respectively. Third, data pattern affects the success rate of NAND, NOR, AND, and OR operations, but only slightly. We observe that executing NAND, NOR, AND, and OR operations with random data patterns decreases the success rate compared to all logic-1/logic-0 patterns by 1.39%, 1.97%, 1.43%, and 1.98%, respectively. Fourth, NOT, NAND, NOR, AND, and OR operations are highly resilient to temperature changes on the DRAM chip. Our results show that an increase in temperature from 50°C to 95°C reduces the success rate by at most only 1.66% across all tested bulk bitwise operations.

We explain how a COTS DRAM chip can perform 1) NOT and 2) many-input NAND, NOR, AND, and OR operations by providing two hypotheses into the underlying operation of COTS DRAM chips. We describe these two hypotheses in detail in §5.1 and §6.1.

First, in the widely adopted [59–62] open-bitline DRAM architecture [62–66], *two relatively far apart DRAM cells* (e.g., Cell A and Cell B in Fig. 1a-❶) connect to *two opposite terminals* of a sense amplifier (Fig. 1a-❶) via access transistors. Shortly after a sense amplifier is enabled to access a DRAM cell, the opposite terminals of the sense amplifier are fundamentally driven by inverted voltage levels (i.e., inverted logic values) due to how the sense amplifier operates [60, 62–65, 67] (e.g., A and  $\sim A$  in Fig. 1a-❷). During standard DRAM operation, *only one* DRAM cell is connected to one of the terminals, and thus, the other terminal is driven by the inverse of the connected cell’s value [60–65, 67]. We hypothesize that when we *simultaneously connect* the two DRAM cells to the two sense amplifier terminals (by simultaneously activating the rows of these cells), we can negate the value stored in one cell and store the negated value in the other cell (Fig. 1a-❸).



**Figure 1: Demonstration of how modern DRAM can provide the NOT operation (a) and two-input AND and NAND operations (b).**

Second, a sense amplifier operates in two steps to access a DRAM cell, where it i) compares the voltage levels of its

<sup>1</sup>We define the *average* success rate of a bitwise operation as the mean of all tested DRAM cells’ success rate across all tested DRAM chips.

two opposite terminals (i.e., ❶ in Fig. 1b) and ii) amplifies the voltage difference between them (❷ in Fig. 1b) [39, 59, 60, 62–65, 67–69]. This voltage difference (at the beginning of a DRAM cell access operation) is mainly a function of the value stored in the accessed DRAM cell. One terminal of the sense amplifier connects to the accessed DRAM cell, while the other terminal (i.e., the reference terminal) is driven to a fixed voltage level called the *reference voltage* (i.e.,  $V_{REF}$  in Fig. 1b-❶) [20, 60, 62–65, 67]. We hypothesize that simultaneously activating multiple DRAM cells connected to both terminals allows us to manipulate the reference voltage value before the sense amplifier amplifies the voltage difference between terminals (e.g.,  $V_{AND}$  in Fig. 1b-❷). This manipulation enables the voltage difference between the two sense amplifier terminals to express a wider variety of functions than just the value stored in an accessed DRAM cell. By carefully initializing DRAM cells connected to the reference terminal (e.g., Cell C and Cell D in Fig. 1b), we can set  $V_{REF}$  to a desired voltage level (e.g.,  $V_{AND}$  in Fig. 1b-❷) that enables us to express two functions: AND and OR operations on the cells connected to the other terminal (e.g., Cell A and Cell B in Fig. 1b-❶, ❷). The key idea for implementing a Boolean AND operation (which is similar to that of a Boolean OR operation) is that if 1)  $V_{AND}$  is *lower* than the voltage level created on the other sense amplifier terminal *only when both A and B store a logic-1 value*, the sense amplifier will output a logic-1 (that is, the result of  $AND(logic-1, logic-1)$ ) and 2)  $V_{AND}$  is *higher* than the voltage level created on the other sense amplifier terminal *for all other combinations of values stored in A and B*, the sense amplifier will output a logic-0. Expressing the AND (OR) operation in one terminal results in the NAND (NOR) operation in the other terminal, as two opposite terminals share a connection via the NOT gate. For example, in Fig. 1b-❸, one terminal becomes  $F=AND(A, B)$  while the other terminal becomes  $\sim F=NAND(A, B)$ .

This paper makes the following key contributions:

- To our knowledge, this is the first work to experimentally demonstrate that unmodified off-the-shelf DRAM chips are capable of performing 1) functionally-complete Boolean operations (i.e., NOT, NAND, and NOR) and 2) many-input (i.e., more than two-input) NAND, NOR, AND and OR operations.
- We extensively characterize the reliability of these previously undemonstrated bulk bitwise operations on 256 modern DRAM DDR4 chips from 22 DRAM modules. Our results show that we can reliably perform NOT operations and {2, 4, 8, 16}-input NAND, NOR, AND, and OR operations on COTS DRAM chips at high success rates (>94%).
- We quantitatively evaluate the impact of two significant factors: 1) data pattern dependence and 2) DRAM chip temperature on the success rate of bulk bitwise operations in DRAM chips. Our results show that the effect of data pattern dependence and temperature on the average success rate of bulk bitwise operations is small, i.e.,  $\leq 1.98\%$  and  $\leq 1.66\%$ , respectively.

We believe that our experimental results fills a large gap in the understanding of COTS DRAM chips’ computational capabilities. By demonstrating the capability of performing

functionally-complete Boolean logic in COTS DRAM chips and characterizing this behavior, we demonstrate the potential of using DRAM as a powerful computation substrate. To aid future research and development, we open-source our infrastructure at <https://github.com/CMU-SAFARI/FCDRAM>.

## 2. Background

We provide a brief background on DRAM organization, DRAM commands, DRAM operation, and Processing-using-DRAM (PuD) in commercial off-the-shelf (COTS) chips. For more detailed background on these, we refer the reader to many prior works [20, 22, 29–31, 33–37, 42–46, 48–53, 68, 70–93].

### 2.1. DRAM Organization & Operation

A modern DRAM system consists of a hierarchy of components: channels, modules, ranks, chips, banks, and subarrays. In a typical system configuration, a CPU chip includes a set of memory controllers, where each memory controller interfaces with a DRAM channel to serve memory requests (i.e., reads or writes). A DRAM channel can host multiple DRAM modules, each of which implements a single or multiple DRAM ranks. A DRAM rank consists of multiple DRAM chips that operate in lock-step, i.e., all chips simultaneously perform the same operation.

**DRAM Subarrays.** Fig. 2 shows the hierarchical organization of a single DRAM chip that consists of multiple DRAM banks, DRAM subarrays, and DRAM cells. A *bank* comprises multiple *subarrays*, each containing a two-dimensional array of DRAM cells. A *cell* consists of an access transistor and a storage capacitor that encodes a single bit of data using its voltage level, either the level of supply voltage (VDD) or the level of ground voltage (GND). Cells in a row share a *wordline* that drives each cell’s access transistor. DRAM cells in the same column share a *bitline*, which is used to read from and write to the cells via the row buffer (which contains sense amplifiers). A *sense amplifier* forms two back-to-back inverters (i.e., NOT gates) and functions as a comparator, comparing the voltage of each endpoint of the NOT gate (i.e., bitlines) and amplifying the difference, which results in VDD in the higher voltage point and GND in the lower voltage point. However, due to the significant size difference between a sense amplifier and a cell, only enough sense amplifiers are fitted in a row to sense half of the cells [60, 68]. To sense the entire row of cells, each subarray has bitlines that connect to two rows of sense amplifiers, one above and one below the subarray, which causes neighboring subarrays to share half of the sense amplifiers. This approach, known as the *open-bitline architecture*, is widely adopted in high-density DRAM [59–61, 63–66, 68, 94]. For simplicity, we assume that all DRAM cells store VDD for logic-1 and GND for logic-0 in the remainder of this study.

**DRAM Commands.** To serve main memory requests, the memory controller issues DRAM commands, e.g., row activation (ACT), bank precharge (PRE), data read (RD), data write (WR), and refresh (REF). To perform a read or write operation, the memory controller first needs to open a row, i.e., copy the data of the cells in the row to the row buffer. To open a row, the memory



Figure 2: Bank and subarray organization in a DRAM chip.

controller issues an ACT command to a bank by specifying the address of the row to open. After activation completes, the memory controller issues either a RD or a WR command to read or write a DRAM word (which is typically equal to 64 bytes) within the activated row. To access data from another DRAM row in the same bank, the memory controller must first close the currently open row by issuing a PRE command. The memory controller also periodically issues REF commands to prevent data loss due to charge leakage.

**DRAM Cell Operation.** We describe DRAM cell operations by explaining the steps in activating a cell. The memory controller initiates each step by issuing a DRAM command. Each step takes a certain amount of time to complete, and thus, a DRAM command is associated with timing constraints known as timing parameters. In Fig. 3, we show how the state of a cell and the sense amplifiers change during the steps involved in an activation operation. Each DRAM cell diagram corresponds to the state of the cell at exactly the tick mark on the time axis, and asserted signals are highlighted in red. The memory controller issues each command (shown in orange boxes below the time axis) at the corresponding tick mark. Initially, 1) cell capacitor stores VDD, 2) the cell is precharged, 3) the sense amplifier is disabled, and its bitlines’ (i.e., bitline and bitline-bar) voltages are stable at VDD/2 (1).



Figure 3: Command sequence for activating a DRAM row and the state of a DRAM cell during each related step.

To activate the cell, the memory controller issues an ACT command (1). As a result, the row decoder asserts the wordline and thus connects the cell capacitor to the bitline. The cell capacitor charge begins to perturb the bitline, known as the *charge-sharing* process (2). Charge sharing continues until the bitline voltages reach a level that the sense amplifier can safely amplify, VDD/2+ $\epsilon$  (2). The sense amplifier then kicks in to amplify the difference between the bitline and the bitline-bar. Depending on the cell’s charge, the bitline becomes either VDD or GND, while the bitline-bar becomes the negated voltage value of the bitline. As the cell in Fig. 3 initially stores VDD, the bitline becomes VDD, whereas the bitline-bar becomes GND at the end of the amplification (3). At this point, the memory controller can read from or write to the cell using RD/WR commands. To return a cell to its precharged state, the voltage

in the cell must first be fully restored, which requires waiting for  $t_{RAS}$  timing parameter (i.e., the timing parameter between ACT and PRE commands) (b). Once the cell is restored, the memory controller can issue a PRE command (c). The cell returns to the precharged state (5), same as the initial state, (1) where the memory controller can reliably issue an ACT command after waiting for timing parameter  $t_{RP}$  (d).

## 2.2. Processing Using COTS DRAM Chips

Processing-using-DRAM (PuD) is an emerging paradigm that can alleviate the bottleneck caused by frequent data movement between processing elements (e.g., CPU) and main memory (DRAM). PuD enables massively parallel in-DRAM computation by leveraging intrinsic analog operational properties of the DRAM circuitry, as initially shown by [29–31, 48–52].

Recent works [37, 38, 41, 42, 45] experimentally demonstrate that COTS DRAM chips can perform 1) data copy & initialization and 2) bitwise operations, by carefully crafting a set of DRAM commands (i.e., ACT → PRE → ACT) with reduced timings (i.e., violating the  $t_{RAS}$  and  $t_{RP}$  timing parameters).

**In-DRAM Data Copy & Initialization.** RowClone [49] enables data movement within a DRAM subarray at a DRAM row granularity by modifying DRAM circuitry. RowClone alleviates the energy and execution time costs of transferring data between the DRAM and the processing units. Recent works [42, 45] experimentally demonstrate that RowClone operation can be performed in COTS DRAM chips by enabling sequential activation of two DRAM rows in the same subarray.

**In-DRAM Bitwise Operations.** Prior work [29] introduces the concept of simultaneously activating three rows in a DRAM subarray (i.e., triple-row activation) through modifications to the DRAM circuitry. This modification enables a three-input bitwise majority (MAJ3) operation among activated rows, which can be used to implement two-input bitwise AND and OR operations. Prior works [38, 41, 45] demonstrate that COTS DRAM chips are capable of performing MAJ3 and two-input AND and OR operations by simultaneously activating multiple rows in the same subarray.

## 2.3. Motivation and Goal

PuD is a promising paradigm that can alleviate the data movement bottleneck and thus improve the overall energy efficiency and performance of DRAM-based systems. To show the potential of using DRAM as a serious computation substrate, it is important to understand and characterize the computational capability of existing COTS DRAM chips.

While the demonstration of MAJ3 and two-input AND and OR operations in COTS DRAM is clearly interesting and shows the potential of investigating PuD as a serious computation substrate, prior works [38, 41, 45] do *not* provide the demonstration of 1) a functionally-complete set of operations (e.g., OR and NOT or NAND/NOR) or 2) AND and OR operations with more than two inputs (e.g., a 16-input OR operation).

Our goal is to 1) understand whether COTS DRAM chips are capable of executing functionally-complete operations and bitwise operations with more than two inputs and 2) rigorously

characterize the reliability of executing bitwise operations in COTS DRAM chips.

## 3. Methodology

We describe our commercial off-the-shelf (COTS) DRAM testing infrastructure (§3.1) and the COTS DDR4 DRAM chips tested for our characterization study (§3.2). We explain the methodology of our different characterization experiments in their corresponding sections (§4, §5, and §6).

### 3.1. COTS DRAM Testing Infrastructure

We conduct COTS DRAM chip experiments using DRAM Bender [41, 95], an FPGA-based DDR4 testing infrastructure that provides precise control of DDR4 commands issued to a DRAM module. Fig. 4 shows our experimental setup that consists of four main components: 1) a host machine that generates the test program and collects experiment results, 2) an FPGA development board [96], programmed with DRAM Bender, 3) a thermocouple temperature sensor and heater pads pressed against the DRAM chips to maintain a target temperature level, and 4) a temperature controller (MaxWell FT200 [97]) that keeps the temperature at the desired level.



Figure 4: Our DRAM Bender based experimental setup.

### 3.2. COTS DDR4 DRAM Chips Tested

Table 1 provides the 256 (22) COTS DDR4 DRAM chips (modules) that we focus our analysis on along with the chip manufacturer (Chip Mfr.), the number of modules (#Modules), the number of chips (#Chips), die revision (Die Rev.), DRAM module’s manufacturing date (Mfr. Date) in the form of year-week, chip density (Density), chip organization (Org.), and DRAM speed rate in mega transfers per second (MT/s).<sup>2</sup>

To investigate whether our characterization study applies to different DRAM technologies, designs, and manufacturing processes, we test a total of 280 (28) COTS DDR4 DRAM chips (modules) from all three major manufacturers (i.e., SK Hynix, Samsung, and Micron) with different die densities and die revisions from each DRAM chip manufacturer. While we observe successful NOT, NAND, NOR, AND, and OR operations in all tested SK Hynix modules, we observe only successful NOT operations in Samsung chips and no successful bitwise operations in Micron chips. Thus, we focus our analysis on 256 (22) COTS DDR4 DRAM chips (modules) from SK Hynix and Samsung, listed in Table 1. We provide much more detail on other chips

<sup>2</sup>The technology node of a DRAM chip is usually not publicly available. Prior works [98–100] assume that for a given chip manufacturer and chip density, the alphabetical order of die revision codes may provide an indication of technology node advancement.

**Table 1: Summary of DDR4 DRAM chips tested.**

| Chip Mfr. | #Modules (#Chips) | Die Rev. | Mfr. Date <sup>a</sup> | Chip Density | Chip Org. | Speed Rate |
|-----------|-------------------|----------|------------------------|--------------|-----------|------------|
| SK Hynix  | 9 (72)            | M        | N/A                    | 4Gb          | x8        | 2666MT/s   |
|           | 5 (40)            | A        | N/A                    | 4Gb          | x8        | 2133MT/s   |
|           | 1 (16)            | A        | N/A                    | 8Gb          | x8        | 2666MT/s   |
|           | 1 (32)            | A        | 18-14                  | 4Gb          | x4        | 2400MT/s   |
|           | 1 (32)            | A        | 16-49                  | 8Gb          | x4        | 2400MT/s   |
|           | 1 (32)            | M        | 16-22                  | 8Gb          | x4        | 2666MT/s   |
| Samsung   | 1 (8)             | F        | 21-02                  | 4Gb          | x8        | 2666MT/s   |
|           | 2 (16)            | D        | 21-10                  | 8Gb          | x8        | 2133MT/s   |
|           | 1 (8)             | A        | 22-12                  | 8Gb          | x8        | 3200MT/s   |

<sup>a</sup> We report “N/A” if no date is marked on the label of a module.

in the extended version of this paper [101]. We hypothesize why we do not observe all tested bitwise operations in Samsung and Micron chips in §7.

## 4. Simultaneous Multiple-Row Activation in Neighboring Subarrays in COTS DRAM Chips

We hypothesize that a COTS DRAM chip can perform bitwise NOT and many-input Boolean operations if such a chip can simultaneously activate multiple DRAM rows in neighboring subarrays, as briefly described in §1. For example, activating one row (i.e., *source row*) in one subarray and another row (i.e., *destination row*) in a neighboring subarray in very short succession could negate the value of the source row and store it in the destination row. Thus, comprehensively evaluating the computational capabilities of a COTS DRAM chip necessitates understanding its simultaneous multiple-row activation properties. These properties include, for example, whether or not the DRAM chip can simultaneously activate multiple rows in neighboring subarrays and how many such rows, in each subarray, the chip can simultaneously activate. This section describes our key idea to perform simultaneous multiple-row activation in neighboring subarrays (§4.1), our experimental methodology for understanding a DRAM chip’s multiple-row activation properties (§4.2) and presents our findings (§4.3).

### 4.1. Multiple-Row Activation in Neighboring Subarrays: Key Idea

**Multiple-Row Activation in One Subarray.** A carefully crafted set of  $\text{ACT } R_F \rightarrow \text{PRE} \rightarrow \text{ACT } R_L$  DRAM command sequence with reduced timings (i.e., with violated  $t_{RAS}$  and  $t_{RP}$  timing parameters) can simultaneously activate multiple rows in the *same subarray*, as shown by prior works [37, 38, 41, 42, 45]. In the  $\text{ACT } R_F \rightarrow \text{PRE} \rightarrow \text{ACT } R_L$  command sequence,  $R_F$  and  $R_L$  denote the row addresses to which the ACT commands are sent (i.e., the first ACT command is sent to  $R_F$  and the last, second, ACT command is sent to  $R_L$ ), and the rows  $R_F$  and  $R_L$  are in the same subarray.

**Key Idea.** Our key idea is to simultaneously activate multiple rows in *two neighboring subarrays* by issuing the  $\text{ACT } R_F \rightarrow \text{PRE} \rightarrow \text{ACT } R_L$  command sequence with reduced timings, where  $R_F$  points to a row in one subarray and  $R_L$  points to a row in a neighboring subarray.

**How to Simultaneously Activate Multiple Rows.** Our key idea relies on a fundamental design characteristic in modern

DRAM chips: hierarchical design of the row decoder circuitry. Modern DRAM chips have multiple levels of row address decoding stages to reduce latency, area, and power consumption [37, 65, 67, 102, 103]. The hierarchical row decoder structure expands a row address into a set of intermediate control signals and issuing an ACT command asserts multiple control signals [37, 65, 67, 102]. Prior works [37, 38] hypothesize that back-to-back ACT commands with reduced timings assert multiple control signals in the hierarchical row decoder design that can potentially activate multiple rows.

**Key Mechanism.** By leveraging the hierarchical structure in the row decoder circuitry, we hypothesize that we can simultaneously activate multiple rows in two neighboring subarrays in three steps. First, we issue the  $\text{ACT } R_F$  command, which activates  $R_F$  and asserts multiple signals/latches in the hierarchical row decoder circuitry. Second, we issue a PRE command with reduced  $t_{RP}$  timing (e.g.,  $t_{RP} < 3\text{ns}$ ), which initiates the deassertion of signals and latches in the hierarchical row decoder design. Third, we issue the  $\text{ACT } R_L$  command by violating the  $t_{RAS}$  timing (e.g.,  $t_{RAS} < 3\text{ns}$ ), which 1) prevents the PRE command from deasserting the signals or resetting the latches in the row decoder, and 2) sets other signals/latches in the row decoder.

### 4.2. Experimental Methodology

**DRAM Subarray Boundaries.** Understanding COTS DRAM chips’ capability of simultaneously activating multiple rows in neighboring subarrays requires reverse engineering DRAM subarray boundaries. Prior work [42, 45] demonstrates that COTS DRAM chips are capable of performing the RowClone operation [49], i.e., copying the content of a DRAM row (i.e., source row) to another DRAM row (i.e., destination row), if the source row and the destination row are in the *same subarray* (see more detail in §2.2). We repeatedly perform the RowClone operation for every possible source and destination row address in each tested bank. When we observe that the destination row gets the same content as the source row after RowClone, we conclude that the source row and the destination row are in the same subarray. Based on this observation, we reverse engineer the DRAM subarray boundaries and which rows are in the same subarray.

**Testing Methodology.** Our experiment consists of three steps. First, we initialize the subarrays corresponding to  $R_F$  and  $R_L$  with a predefined data pattern. Second, we issue the  $\text{ACT } R_F \rightarrow \text{PRE} \rightarrow \text{ACT } R_L$  command sequence with reduced timings to simultaneously activate multiple rows in neighboring subarrays (i.e., our key mechanism, described in §4.1). Third, we issue a WR command with a different data pattern from the predefined data pattern by respecting the timing parameters. This WR command causes the sense amplifiers to overdrive their bitlines [65, 104] and thus updates the values of the cells in all simultaneously-activated DRAM rows using the predefined data pattern. After the three-step procedure, to understand which rows from  $R_F$  and  $R_L$ ’s subarrays are activated by the  $\text{ACT } R_F \rightarrow \text{PRE} \rightarrow \text{ACT } R_L$  command sequence, we precharge the bank and read each row in the subarrays of  $R_F$  and  $R_L$  while ad-

hering to the timing parameters. We expect two outcomes from the read operation. First, the simultaneously activated rows in  $R_L$ 's subarray store the predefined data pattern (the exact data pattern sent with the WR command) because  $R_L$  receives the last activate command in the ACT  $R_F \rightarrow$  PRE  $\rightarrow$  ACT  $R_L$  command sequence. Second, half of the DRAM cells in the simultaneously activated rows in  $R_F$ 's subarray store the inverse of the predefined data pattern if  $R_L$  and  $R_F$ 's subarrays are neighbors. This way, we determine 1) if  $R_L$  and  $R_F$  are in neighboring subarrays, and 2) how many (and which) rows ACT  $R_F \rightarrow$  PRE  $\rightarrow$  ACT  $R_L$  simultaneously activates in neighboring subarrays.

**Number of  $R_F$  and  $R_L$  Combinations Tested.** We extensively perform our experiments in four randomly selected neighboring subarray pairs (i.e., a total of eight subarrays) for each bank in each tested SK Hynix module. We test every possible combination of  $R_F$  and  $R_L$  in ACT  $R_F \rightarrow$  PRE  $\rightarrow$  ACT  $R_L$  for each of the neighboring subarray pairs.<sup>3</sup>

**Terminology.** To ease the understanding of how many rows we can simultaneously activate in two neighboring subarrays, we introduce a new term,  $N_{RF}:N_{RL}$  activation where  $N_{RF}$  is the number of simultaneously activated rows in  $R_F$ 's subarray and  $N_{RL}$  is the number of simultaneously activated rows in  $R_L$ 's subarray. For example, 2:4 activation stands for simultaneously activating two rows in  $R_F$ 's subarray and four rows in  $R_L$ 's subarray.

**Metric.** We define a metric called *coverage of each unique  $N_{RF}:N_{RL}$  activation type* as the fraction of  $R_F$  and  $R_L$  combinations out of all possible combinations. For example, if we perform 2:4 activation with 100 combinations of  $R_F$  and  $R_L$  out of 1000, the coverage of 2:4 activation is 10%.

### 4.3. COTS DRAM Chip Characterization

This section presents our characterization of the simultaneously activating multiple rows in neighboring subarrays in SK Hynix chips. We note that we conduct experiments in DRAM modules from two other major manufacturers (i.e., Samsung and Micron). In Samsung chips, we only observe sequential two-row activation in neighboring subarrays, whereas, in Micron chips, we observe neither simultaneous nor sequential row activation in neighboring subarrays.<sup>4</sup>

Fig. 5 shows the distribution of observed coverage of each  $N_{RF}:N_{RL}$  activation type across all tested combinations of  $R_F$  and  $R_L$  row addresses in a box-and-whiskers plot.<sup>5</sup> The x-axis shows the  $N_{RF}:N_{RL}$  activation type and the y-axis shows the coverage of each  $N_{RF}:N_{RL}$  activation type. We make Observations 1 and 2 from Fig. 5.

**Observation 1.** COTS DRAM chips can simultaneously activate multiple rows in two neighboring subarrays.

<sup>3</sup>We test 409,600 a total combinations of  $R_F$  and  $R_L$  row addresses for each neighboring subarray pair.

<sup>4</sup>We refer to §7 for a hypothesis why we observe limited capability in Micron and Samsung chips. The extended version of this paper [101] presents more results and discussion on this finding.

<sup>5</sup>The box is lower-bounded by the first quartile (i.e., the median of the first half of the ordered set of data points) and upper-bounded by the third quartile (i.e., the median of the second half of the ordered set of data points). The interquartile range (IQR) is the distance between the first and third quartiles (i.e., box size). Whiskers show the minimum and maximum values.



Figure 5: Coverage of each  $N_{RF}:N_{RL}$  activation type across tested  $R_F$  and  $R_L$  row pairs.

COTS DRAM chips can perform 1:1, 1:2, 2:2, 2:4, 4:4, 4:8, 8:8, 8:16, 16:16, and 16:32 activation with an average coverage across all tested DRAM chips of 0.23%, 0.15%, 2.60%, 1.53%, 11.58%, 5.42%, 24.52%, 7.95%, 24.35%, and 3.82%, respectively. We observe that when we issue ACT  $R_F \rightarrow$  PRE  $\rightarrow$  ACT  $R_L$  and followed by a WR command, all cells in the simultaneously activated row(s) in  $R_F$ 's subarray store the exact data pattern sent with the WR command. On the other hand, half of the cells in the simultaneously activated row(s) in  $R_L$ 's subarray stores the negated value of the data pattern sent with the WR command. The remaining half retain their initial values. This observation supports our hypothesis in §4.2.

**Observation 2.** COTS DRAM chips have two distinct sets of  $N_{RF}:N_{RL}$  activation patterns in neighboring subarrays: one where  $N_{RF}=N_{RL}$  and another where  $2 \times N_{RF}=N_{RL}$ .

COTS DRAM chips have two distinct  $N_{RF}:N_{RL}$  activation patterns in two neighboring subarrays: 1) N:N, where exactly the same number of rows are activated in each subarray (i.e.,  $N_{RF}=N_{RL}=N$ ) and 2) N:2N, where twice as many rows are activated in  $R_L$ 's subarray than in  $R_F$ 's (i.e.,  $N_{RF}=N$  and  $N_{RL}=2N$ ). We observe that some DRAM modules have the ability to perform both N:2N and N:N activation patterns, resulting in simultaneous activation of up to 48 rows ( $N=16$ ). On the other hand, some DRAM modules only support the N:N activation pattern, which results in simultaneous activation of up to 32 rows ( $N=16$ ).

We observe that the row addresses of  $R_F$  and  $R_L$  in the ACT  $R_F \rightarrow$  PRE  $\rightarrow$  ACT  $R_L$  command sequence determine 1) the  $N_{RF}:N_{RL}$  activation pattern (i.e., N:N or N:2N) and 2) the number of simultaneously activated rows (i.e., the value of  $N$ ). We hypothesize that this is due to the structure of the row decoder circuitry. A concurrent work [105] describes a hypothetical row decoder design and how ACT  $src \rightarrow$  PRE  $\rightarrow$  ACT  $dst$  can activate multiple rows in detail. This hypothetical row decoder [105] design could explain how N:N and N:2N activation patterns happen and for which addresses. Due to space limitations, we refer the reader to the concurrent work [105] for more detail on the hypothetical row decoder circuitry in DRAM chips and how this row decoder design simultaneously activates multiple rows. From Obsvs. 1 and 2, we draw Takeaway 1.

**Takeaway 1.** COTS DRAM chips can simultaneously activate multiple DRAM rows (up to 48 DRAM rows) in two neighboring subarrays.

## 5. NOT Operation in COTS DRAM Chips

We demonstrate a new computation capability of commercial off-the-shelf (COTS) DRAM chips: we can perform the bitwise NOT operation in COTS DRAM chips by leveraging simultaneous multiple row activation in neighboring subarrays (§4). We simultaneously activate multiple rows in two neighboring subarrays to connect a row in one subarray (i.e., source row) to a row in a neighboring subarray (i.e., destination row) via a NOT gate in the sense amplifiers, shared by these neighboring subarrays. This connection results in (half of) the destination row storing the negated value of (half of) the source row (§5.1). We demonstrate that COTS DRAM chips can execute the NOT operation and quantitatively analyze the reliability of performing the NOT operation in 256 COTS DRAM chips (§5.3).

### 5.1. NOT Operation: Key Idea

We leverage our key observation to enable NOT in COTS DRAM chips: simultaneous activation of multiple rows in two neighboring subarrays. Our key idea is to simultaneously activate multiple rows in neighboring subarrays to connect simultaneously activated rows in neighboring subarrays through a NOT gate in the shared sense amplifier. We use this connection to negate data in (half of) one row (i.e., source row) to another row that resides in the neighboring subarray (i.e., destination row).

Fig. 6 demonstrates our key idea to enable NOT in an example of two cells (*src* and *dst*) where *src*'s subarray and *dst*'s subarray are neighbors, i.e., physically adjacent. The memory controller issues each command (shown in orange boxes below the time axis) at the corresponding tick mark, and asserted signals are highlighted in red. Cells initially have a voltage level of ground (GND), and the bitline (i.e., *src*'s bitline) and bitline-bar (i.e., *dst*'s bitline) initially have a voltage level of VDD/2 (1).



**Figure 6: Command sequence for performing the NOT operation in COTS DRAM chips and the state of cells during each related step.**

Our key idea enables NOT operation in COTS DRAM chips in three steps. First, we issue an ACT command to *src*, i.e., ACT *src* (a), and wait for the manufacturer-recommended *t<sub>RAS</sub>* timing parameter to restore the charge of *src* (b). As a result, the bitline reaches the *src* voltage (GND), whereas the bitline-bar reaches the negated *src* voltage (VDD) (2). Second, we issue a PRE command (c) and with violated *t<sub>RP</sub>* timing, e.g., <3ns (d), we issue another ACT command to activate *dst*, i.e., ACT *dst* (e). Issuing back-to-back PRE → ACT *dst* activates *dst* without deactivating *src* (3) [37, 38, 42, 45]. This results in the bitline-bar sharing its charge with *dst* by driving the negated voltage value of *src* (VDD) into *dst* (4). Third, we wait for the manufacturer-recommended *t<sub>RAS</sub>* timing parameter

(f), which completely restores the charge of *dst*, and thus, the negated value of *src* (VDD) is written to *dst* (5). Fourth, we send a PRE command to complete the process (g).

## 5.2. Experimental Methodology

**Testing Methodology.** Our experiment consists of four steps. We 1) initialize the source row (*src*)'s subarray and destination row (*dst*)'s subarray with a predetermined data pattern where the subarrays of *src* and *dst* are neighbors, 2) write a data pattern that is different from the predetermined data pattern to *src*, 3) issue ACT *src* → PRE → ACT *dst* command sequence with reduced *t<sub>RP</sub>* timing to perform the NOT operation, as described in §5.1 and 4) wait for *t<sub>RAS</sub>* and precharge the tested bank. After the four-step procedure, we read all rows in *dst*'s subarray. If COTS DRAM chips can execute the NOT operation, half of the cells in the simultaneously activated rows in *dst*'s subarray will store the *negated* random data pattern that is written to the *src*.<sup>6</sup> We refer to simultaneously activated rows in *dst*'s subarray as *destination rows*.

**Distance Between a Row and Sense Amplifiers.** To understand the effects of *design-induced variation* [106] on the reliability of NOT operations, we analyze how the distance between activated rows (i.e., *src* and destination rows) and the shared sense amplifiers (i.e., the sense amplifiers physically adjacent to both of the neighboring subarrays) affects the reliability of NOT operations.<sup>7</sup>

We leverage a widely-used reverse engineering technique [99, 100, 107–110] where we analyze RowHammer [111–113] errors (i.e., bitflips) to understand how close a row is to the sense amplifiers. The technique relies on a key characteristic of RowHammer bitflips: when we perform single-sided RowHammer [99, 100, 109, 111, 114], we observe bitflips in a row that is physically adjacent to the frequently activated (or hammered) aggressor row. If we observe bitflips in two rows, we conclude that one of them is located above the aggressor row while the other is located below the aggressor row. However, if we observe bitflips in only one row, we hypothesize that the aggressor row is physically adjacent to the sense amplifier, and thus, only one row has bitflips. By leveraging these observations, we uncover the physical order of rows in every tested subarray.

We categorize the distance between activated rows and the sense amplifiers into three regions, each of which has one third of all rows in the subarray: Far, Close, and Middle. “Far” refers to the rows that are farthest away from the sense amplifiers, “Close” refers to the rows that are closest to the sense amplifiers, and “Middle” refers to the remaining rows in the subarray.

**Metric.** To quantitatively evaluate the reliability of the NOT operation, we use a metric called *success rate*. The success rate for a DRAM cell refers to the percentage of trials where the negated value of *src* is stored in the DRAM cell out of all

<sup>6</sup>Since half of the bitlines are shared among two neighboring subarrays through a NOT gate, the proposed NOT operation can negate half of the row.

<sup>7</sup>We are interested in analyzing the effects of design-induced variation because prior work [106] experimentally demonstrates that there is large design-induced variation in DRAM cells based on their physical location in the DRAM chip. For example, cells closer to the peripheral structures (e.g., sense amplifiers or wordline drivers) can be accessed faster than cells that are farther [106].

tested 10000 trials. For example, if a cell (in `dst`) stores the negated value (of `src`) in 1000 trials out of 10000 trials, the success rate of the cell is 10%. We define the *average success rate* as the mean of all tested DRAM cells' success rate.

**The Number of Tested Instances.** We test all 16 banks in each tested SK Hynix and Samsung chip. We extensively perform our experiments in four randomly selected subarray pairs (i.e., a total of eight subarrays) in each bank. For each neighboring subarray pair, we test every possible `src` and `dst` row addresses (e.g., 409,600 such combinations of `src` and `dst` row addresses for a subarray pair in an SK Hynix module).

**Data Patterns.** We use two random data patterns (RAND1 and RAND2) and two fixed data patterns (all-1s and all-0s that fill a row with logic-1 and logic-0 values, respectively). We fill 1) the `src` with RAND1 and 2) the rows in the `dst`'s subarrays and the rows in the `src`'s subarray (except `src`) with RAND2. We present the results for random data patterns as we observe at most <0.1% success rate difference between initializing rows with random data versus all-1s/0s.

**Temperature.** We perform our experiments at five temperature levels: 50°C, 60°C, 70°C, 80°C, and 95°C. All experiments are conducted at 50°C unless stated otherwise.

### 5.3. COTS DRAM Chip Characterization

This section presents our rigorous characterization of the reliability of NOT operations in SK Hynix and Samsung chips. While we test all three major manufacturers (i.e., SK Hynix, Samsung, and Micron), we note that 1) NOT operations have one destination row in Samsung chips, as we only observe sequential two-row activation in neighboring subarrays (§4.3) in Samsung chips, and 2) we do *not* observe NOT operations in Micron chips.<sup>4</sup>

**Number of Destination Rows.** We analyze the reliability of NOT operations with 1, 2, 4, 8, 16, and 32 destination rows. Fig. 7 shows the success rate distribution across DRAM cells in all tested SK Hynix and Samsung DRAM chips in a box and whiskers plot as we vary the number of destination rows from 1 to 32.<sup>5</sup> We make Observations 3 and 4 from Fig. 7.



Figure 7: Success rate of the NOT operation in COTS DRAM chips with varying numbers of destination rows.

**Observation 3.** In every tested number of destination rows, there is at least one DRAM cell where we can perform the NOT operation with a 100% success rate.

COTS DRAM chips can perform NOT operations with 1, 2, 4, 8, 16, and 32 destination rows. We observe that in every number of destination rows tested, there is at least one cell that we can use to perform the NOT operation with 100% success rate.

**Observation 4.** As the number of destination rows increases, more DRAM cells produce incorrect results, leading to a decrease in success rate.

For example, the average success rate of the NOT operation with one destination row is 98.37%, but it is only 7.95% with 32 destination rows. We hypothesize that the decrease in success rate could be due to the increase in total bitline capacitance that a sense amplifier must drive as the number of simultaneously activated rows increases. A sense amplifier has to drive the negated voltage level of a source row's cell into 32 destination cells when we perform the NOT operation with 32 destination rows, whereas the sense amplifier needs to drive the negated voltage value into *only* one destination cell when we perform the NOT operation with a *single* destination row. Weaker sense amplifiers could incorrectly drive the negated value of a source row's cell into multiple destination rows, and DRAM sense amplifiers are clearly *not* designed for this purpose.

**Patterns of Multiple Row Activation in Neighboring Subarrays.** We analyze the effect of  $N_{RF}:N_{RL}$  activation (where  $N_{RF}$  is the number of the simultaneously activated rows in `src`'s subarray, and  $N_{RL}$  is the number of simultaneously activated rows in `dst`'s subarray), on the reliability of the NOT operation. In §4.3, we observe two distinct sets of  $N_{RF}:N_{RL}$  activation patterns: 1) N:N where  $N_{RF}=N_{RL}$  and 2) N:2N where  $2 \times N_{RF}=N_{RL}$ . Fig. 8 shows the distribution of success rate across DRAM cells in a box and whiskers plot, with the x-axis representing  $N_{RF}:N_{RL}$  activation types (e.g., 1:2 represents one row is activated in `src`'s subarray and two rows are activated in `dst`'s subarray). We make Observation 5 from Fig. 8.



Figure 8: Success rate of the NOT operation vs.  $N_{RF}:N_{RL}$  activation type.

**Observation 5.** The N:2N activation pattern results in slightly higher success rate than the N:N activation pattern.

Using the N:2N activation pattern to perform the NOT operation exhibits 9.41% higher average success rate than using the N:N activation pattern to perform the NOT operation. We hypothesize that this is due to the total number of activated rows in two neighboring subarrays. For example, to perform the NOT operation with 16 destination rows, sense amplifiers have to simultaneously drive 32 (24) rows when using the N:N (N:2N) pattern. This results in sense amplifiers driving more rows in the N:N pattern than the N:2N pattern. Due to process variation, not all sense amplifiers are strong enough to correctly drive the corresponding voltage value into multiple rows (i.e., `src`'s voltage value into the activated rows in `src`'s subarray and the negated voltage value of `src` into destination rows) at once. This leads to a decrease in the success rate of the NOT operation as the number of simultaneously activated rows increases.

**Distance to the Sense Amplifier.** Fig. 9 shows the average success rate of the NOT operation (i.e., the mean success rate of every tested DRAM cell in all tested destination rows) using a heatmap plot based on the proximity of  $\text{src}$  (y-axis) and all activated destination rows (x-axis) to the sense amplifiers that physically reside between (i.e., shared by)  $\text{src}$  and  $\text{dst}$ 's subarrays. We make Observation 6 from Fig. 9.



Figure 9: Success rate of the NOT operation based on the distance of activated rows to sense amplifiers.

**Observation 6.** The success rate of the NOT operation significantly varies based on the proximity of the activated rows ( $\text{src}$  and the destination rows) to the sense amplifiers.

For example, when the source row is in the middle of the subarray, and the destination rows are far from the sense amplifier (i.e., Middle-Far), the average success rate is 85.02%. However, when the source row is far from the sense amplifiers and the destination rows are close to the sense amplifiers (i.e., Far-Close), the average success rate is 44.16%. We hypothesize that *design-induced variation* [106] could explain the large variation in the success rate. Design-induced variation causes cell access latency characteristics to vary deterministically depending on the physical location of cells in the memory chip, including the cell's distance to the sense amplifier.

**Temperature.** In this experiment, we use destination cells that can perform NOT operations with >90% success rate at 50°C.<sup>8</sup> Fig. 10 shows the success rate distribution across all tested DRAM cells for five different temperature levels: 50°C, 60°C, 70°C, 80°C, and 95°C. The figure clusters boxes into six groups based on the number of activated destination rows (x-axis). In each cluster of boxes, temperature increases from left to right. We make Observation 7 from Fig. 10.



Figure 10: Success rate of the NOT operation at different DRAM chip temperatures.

**Observation 7.** Temperature has a small effect on the reliability of NOT operations in COTS DRAM chips.

For example, performing the NOT operation with 32 destination rows (which is the most vulnerable operation to temperature

<sup>8</sup>To maintain a reasonable testing time, we test destination cells with >90% success rate from every tested destination row, obtained from all tested SK Hynix DRAM modules (on average  $\approx 1.39$  billion such cells per module).

variation) exhibits a 0.20% variation in average success rate when the temperature increases from 50°C to 95°C.

**Takeaway 2.** COTS DRAM chips can perform NOT operations with up to 32 destination rows. NOT operations are highly resilient to temperature changes.

**DRAM Speed Rate.** We analyze the effect of DRAM speed rate on the success rate of NOT operations in SK Hynix DRAM modules. Fig. 11 shows the success rate distribution across DRAM cells, with the x-axis representing the number of destination rows and the hue showing the DRAM speed rate. In each cluster of boxes, the DRAM speed rate increases from left to right. We make Observation 8 from Fig. 11.



Figure 11: Success rate of the NOT operation for different DRAM speed rates.

**Observation 8.** The success rate of a NOT operation significantly varies across tested DRAM speed rates.

For example, for the NOT operation with 4 destination rows, average success rate decreases by 20.06% as DRAM speed rate increases from 2133 MT/s to 2400 MT/s. However, the average success rate of the NOT operation with 4 destination rows increases by 19.76% from 2400 MT/s to 2666 MT/s.

**Chip Density and Die Revision.** We analyze the effect of chip density and die revision on the reliability of NOT operations. To do so, we use DRAM modules from two major manufacturers spanning different chip densities and die revisions.<sup>2</sup> For this analysis, we show the results for performing NOT operations with only one destination row.<sup>9</sup> Fig. 12 shows the success rate distribution across DRAM cells, with the x-axis representing the module's chip density, die revision and manufacturer. We make Observation 9 from Fig. 12.



Figure 12: Success rate of the NOT operation for different chip density and die revision combinations for two major manufacturers.

**Observation 9.** Chip density and die revision affect the success rate of NOT operations for each tested manufacturer.

For example, in SK Hynix DRAM chips, the average success rate decreases by 8.05% from 8Gb M-die to 8Gb A-die. In

<sup>9</sup>We use one destination row as Samsung chips cannot perform the NOT operation with more than one destination row. We show more comprehensive results for more than one destination row using SK Hynix chips. in the extended version of this paper [101]

Samsung DRAM chips, the average success rate decreases by 11.02% from A-die to D-die.

**Takeaway 3.** NOT operation’s reliability significantly varies across DRAM chips with different speed rates, die revisions, and chip densities.

## 6. Many-Input NAND, NOR, AND, and OR in COTS DRAM Chips

We demonstrate a new computational capability of commercial off-the-shelf (COTS) DRAM chips: we can perform many-input (i.e., up to 16-input) NAND, NOR, AND, and OR operations in COTS DRAM chips. First, we leverage simultaneous multiple-row activation in neighboring subarrays (§4) to manipulate the bitline voltage in one of the neighboring subarrays, which we refer to as the *reference subarray*. Manipulating the bitline voltage in the reference subarray enables us to perform many-input AND and OR operations in the neighboring subarray of the reference subarray, which we refer to as the *compute subarray*. Second, we leverage the NOT gate connection (through sense amplifiers) between simultaneously activated rows in neighboring subarrays to realize NAND and NOR operations, such that the result of an AND (OR) operation on a set of input operands in the compute subarray is inverted to become the result of a NAND (NOR) operation on the same set of input operands in the reference subarray. This section describes our key idea to enable NAND, NOR, AND, and OR operations in detail (§6.1), our experimental methodology (§6.2), and presents our rigorous characterization of if and how well COTS DRAM chips can perform these Boolean operations (§6.3).

### 6.1. Many-Input Boolean Operations: Key Idea

Our key idea is to leverage an implication of simultaneous multiple row activation in neighboring subarrays for sense amplifier operation (see §2.1 for an overview of how a sense amplifier operates): the sense amplifier terminals exhibit voltage levels equivalent to *a function of the values stored in many simultaneously activated cells*.

To provide an intuitive high-level explanation of our key idea, we consider that a sense amplifier works in two steps where 1) it compares the voltage levels on its two terminals, *terminal A* and *terminal B*, and 2) drives terminal A with the result of the comparison (i.e., if  $A > B$ , the result is a high voltage value and vice versa) and terminal B with the inverse of the result. When this sense amplifier operates under standard operating conditions (i.e., single row activation), the result of the comparison is nothing but the value stored in the activated cell. This is because terminals A and B are initially precharged (i.e., they store  $VDD/2$ ), and the activation of a cell perturbs the bitline voltage (i.e., one of the two terminals stores a value lower or higher than  $VDD/2$  when the sense amplifier compares their values) towards the value stored in the activated cell. However, when we activate multiple cells connected to terminals A and B, we can make both terminal A and terminal B exhibit a wider variety of voltage levels when the sense amplifier compares the two terminals’ voltage levels. The wide variety of voltage

levels is essentially a function of the values stored in the simultaneously activated cells. Next, we explain in detail how this function of the values stored in the cells could be combined with the sense amplifier’s fundamental comparator operation to implement many-input Boolean AND, OR, NAND, and NOR operations.

**6.1.1. Performing Two-Input AND and OR.** Fig. 13 illustrates an example of an in-DRAM two-input AND operation. The memory controller issues each command (shown in orange boxes below the time axis) at the corresponding tick mark, and asserted signals are highlighted in red. In this figure, we have two neighboring subarrays: the *reference subarray* and the *compute subarray*, each containing two cells.  $V_{REF}$  is the voltage level of the reference subarray’s bitline, and  $V_{COM}$  is the voltage level of the compute subarray’s bitline.<sup>10</sup> Assume that the ACT R<sub>REF</sub> → PRE → ACT R<sub>COM</sub> command sequence with reduced timing simultaneously activates all four rows in these two subarrays where R<sub>REF</sub> points to a row in the reference subarray and R<sub>COM</sub> points to a row in the compute subarray. Initially, 1) we store VDD in one cell and VDD/2 in the other cell in the reference subarray, and 2) we store a voltage level of X in one cell and Y in the other cell in the compute subarray (1).<sup>11</sup>



Figure 13: Performing two-input AND and NAND operations in COTS DRAM chips.

To perform a two-input AND operation, we first issue one ACT R<sub>REF</sub> → PRE → ACT R<sub>COM</sub> command sequence (1). Doing so activates four rows simultaneously and enables charge-sharing between their bitlines. At the end of charge-sharing,  $V_{REF}$  becomes  $3VDD/4$  (i.e., the mean of VDD and VDD/2) and  $V_{COM}$  becomes  $(X+Y)/2$  (2). The sense amplifier then kicks in and amplifies the voltage difference between  $V_{REF}$  and  $V_{COM}$ . If X and Y have VDD (i.e.,  $V_{COM}=VDD$ ),  $V_{COM}$  is higher than  $V_{REF}$ , resulting in VDD in the compute subarray’s activated cells and GND in the reference subarray’s activated cells (3a). Otherwise,  $V_{COM}$  is lower than  $V_{REF}$ , resulting in GND in the compute subarray’s activated cells and VDD in the reference subarray’s activated cells (3b). After waiting for  $t_{RAS}$ , we issue a PRE command to complete the two-input AND operation.

We can perform a two-input OR operation when we store

<sup>10</sup>To simplify the explanation, we assume that the bitline has no capacitance (i.e., after charge sharing, the bitline’s voltage is the *mean* voltage value stored in DRAM cells that contribute to charge sharing) and the modeled DRAM circuitry is not subject to the effects of process variation. We later discuss and experimentally demonstrate (§6.3) how process variation affects the operation described.

<sup>11</sup>We store VDD/2 in a DRAM cell using an operation called Frac [38] in COTS DRAM chips.

GND and VDD/2 in the activated cells of the reference subarray. This results in  $V_{REF}$  becoming VDD/4 in Fig. 13-②. Thus, the output will be i) logic-1, if at least one activated cell of the compute subarray stores VDD ii) logic-0, otherwise.

**6.1.2. Performing N-Input AND and OR.** We perform  $N$ -input AND and OR operations by activating  $N$  rows in the reference subarray and  $N$  rows in the compute subarray. To simplify our explanations, we refer to  $V_{REF}$  as  $V_{AND}$  when we execute the AND operation and as  $V_{OR}$  when we execute the OR operation.

For an  $N$ -input AND operation, the output has to be i) logic-1, if all  $N$  rows in the compute subarray store  $VDD$ , ii) logic-0, otherwise. This is because the AND operation outputs logic-1 only when all of its inputs are logic-1. That is, the AND operation should output logic-1 only when  $V_{COM}$  is VDD, and output logic-0 for all other possible  $V_{COM}$  values. Thus,  $V_{AND}$  (the bitline voltage in the reference subarray) has to be inbetween  $(N - 1)*VDD/N$  (the highest  $V_{COM}$  value for the AND operation to output logic-0) and VDD (the lowest and the only  $V_{COM}$  value for the AND operation to output logic-1). Consequently, the output will be i) logic-1 when all inputs are  $VDD$  (i.e.,  $V_{COM}=VDD$  and thus  $V_{COM}>V_{AND}$ ) and ii) logic-0, when at least one of the inputs store  $GND$  (i.e.,  $V_{COM}\leq(N - 1)*VDD/N$  and thus  $V_{COM}<V_{AND}$ ).

For an  $N$ -input OR operation, the output has to be i) logic-0, if all  $N$  rows in the compute subarray store  $GND$ , ii) logic-1, otherwise. This is because the OR operation outputs logic-0 only when all of its inputs are logic-0. That is, the OR operation should output logic-0 only when  $V_{COM}$  is GND, and output logic-1 for all other possible  $V_{COM}$  values. Thus,  $V_{OR}$  (the bitline voltage in the reference subarray) has to be inbetween GND (the highest and the only  $V_{COM}$  value for the OR operation to output logic-0) and  $VDD/N$  (the lowest  $V_{COM}$  value for the OR operation to output logic-1). As a result, the output will be i) logic-0 when all inputs are  $GND$  (i.e.,  $V_{COM}=GND$  and thus  $V_{COM}<V_{OR}$ ) and ii) logic-1, when at least one of the inputs store  $VDD$  (i.e.,  $V_{COM}\geq VDD/N$  and thus  $V_{COM}>V_{OR}$ ).

**Key Mechanism.** Fig. 14 illustrates our key mechanism to perform an  $N$ -input AND operation in COTS DRAM chips. To implement an  $N$ -input AND (OR) operation, our key mechanism consists of three steps. First, for the AND (OR) operation, we store VDD (GND) in  $N - 1$  of the simultaneously activated rows in the reference subarray ( $\{R_{REF_0}, \dots, R_{REF_{N-2}}\}$ ), and VDD/2 in the other simultaneously activated row in the reference subarray ( $R_{REF_{N-1}}$ ). By doing so, for the AND operation, we set  $V_{AND}$  to  $(N - 0.5)*VDD/N$  (②) whereas for the OR operation, we set  $V_{OR}$  to  $0.5*VDD/N$ . Second, we issue  $ACT\ R_{REF} \rightarrow PRE \rightarrow ACT\ R_{COM}$  with reduced timings to simultaneously activate  $N$  rows in the reference subarray ( $\{R_{REF_0}, \dots, R_{REF_{N-1}}\}$ ) and  $N$  rows in the compute subarray ( $\{R_{COM_0}, \dots, R_{COM_{N-1}}\}$ ), as described in §6.1.1 (①). Third, we wait for the manufacturer-recommended  $t_{RAS}$  timing parameter (⑪), which overwrites the activated cells in the compute subarray with the result of the AND (OR) operation (③a and ③b). Fourth, we issue a PRE command to complete the operation (⑪).



Figure 14: Performing N-input AND and NAND operations in COTS DRAM chips.

**6.1.3. Performing Many-Input NAND and NOR.** We leverage the NOT operation (§5) and many-input AND and OR operations (§6.1.2) to implement NAND and NOR as a NAND (NOR) operation is simply a chain of NOT $\rightarrow$ AND (NOT $\rightarrow$ OR) operations. During the AND (OR) operation,  $V_{COM}$  becomes the output of the AND (OR) operation, and, at the same time,  $V_{REF}$  becomes the output of the NAND (NOR) operation. For example, in Fig. 14, where we demonstrate how we perform an  $N$ -input AND operation,  $V_{REF}$  has the negated value of  $V_{COM}$  (③a and ③b), resulting in the activated cells in the reference subarray to store the output of an  $N$ -input NAND operation.

## 6.2. Experimental Methodology

The goal of our experiment is to understand if and how well COTS DRAM chips can perform NAND, NOR, AND, and OR operations. To perform these operations, our key idea relies on simultaneously activating an equal number of rows in neighboring subarrays (i.e., the  $N:N$  activation pattern) as described in §6.1. We refer to the DRAM command sequence that activates  $N$  rows in each neighboring subarray as  $ACT\ R_{REF} \rightarrow PRE \rightarrow ACT\ R_{COM}$  where  $R_{REF}$  points to a row in the reference subarray and  $R_{COM}$  points to a row in the compute subarray.

**Testing Methodology.** Our experiment consists of four steps. First, we initialize  $N$  rows in the reference subarray. For the AND/NAND (OR/NOR) operation,  $N-1$  rows are initialized with logic-1 (logic-0). We store VDD/2 in the remaining row by performing the Frac operation [38]. Second, we store  $N$  input operands in  $N$  rows in the compute subarray. Third, we issue the  $ACT\ R_{REF} \rightarrow PRE \rightarrow ACT\ R_{COM}$  command sequence with reduced  $t_{RAS}$  and  $t_{RP}$  timings. Fourth, we wait for manufacturer-recommended  $t_{RAS}$  timing and precharge the tested bank. After the four-step procedure, we read all rows in the reference and the compute subarrays. If a COTS DRAM chip can perform AND/NAND (OR/NOR) operations, the simultaneously activated  $N$  rows in the compute subarray will store the result of the AND (OR) operation, and the simultaneously activated  $N$  rows in the reference subarray will store the result of the NAND (NOR) operation.

**Metric.** We use the same metric that we use to evaluate the reliability of a NOT operation: *success rate*. The success rate for a DRAM cell refers to the percentage of trials where the correct output of a Boolean operation (i.e., AND or OR operation for a cell in the compute subarray and NAND or NOR operations for a cell in the reference subarray) is stored in the DRAM cell out of all tested 10000 trials. For example,

if a cell in the compute subarray stores the correct output of the 16-input AND operation in 1000 trials out of 10000 16-bit AND operation trials, the success rate of the cell is 10% for the 16-input AND operation. We define the *average success rate* as the mean of all tested DRAM cells' success rate.

**Number of Tested DRAM Cells.** We test all 16 banks in each tested SK Hynix chip. We extensively perform our experiments in four randomly selected neighboring subarray pairs (i.e., a total of eight subarrays) in each bank. For each neighboring subarray pair, we test DRAM cells that can perform NOT operations with >90% success rate at 50°C.<sup>8</sup>

**Data Pattern.** We use two data patterns: 1) the all-1s/0s patterns, where each row in the compute subarray is filled completely either with logic-1 or logic-0 values (e.g., for 4-input operations there are 16 unique such data patterns because each of the simultaneously activated four rows can either store all-1s or all-0s), and 2) the random data pattern, where we fill each row in the compute subarray with random data. All experiments are conducted using the random data pattern unless stated otherwise.

**Temperature.** We perform our experiments at five temperature levels: 50°C, 60°C, 70°C, 80°C, and 95°C. All experiments are conducted at 50°C unless stated otherwise.

### 6.3. COTS DRAM Chip Characterization

We characterize the reliability of AND, NAND, OR, and NOR operations in SK Hynix chips. While we test all three major manufacturers, we note that we do *not* observe AND, NAND, OR, and NOR operations in Samsung and Micron chips.<sup>4</sup>

**Number of Input Operands.** We evaluate the reliability of AND, NAND, OR, and NOR operations with 2, 4, 8, and 16 input operands. Fig. 15 shows the success rate distribution across DRAM cells in all tested SK Hynix DRAM chips in a box and whiskers plot as we vary the number of input operands from 2 to 16. We make Observations 10-13 from Fig. 15.



Figure 15: Success rates of AND, NAND, OR, and NOR operations in COTS DRAM chips with varying numbers of input operands.

**Observation 10.** We can perform 2-input, 4-input, 8-input, and 16-input AND, NAND, OR, and NOR operations with a very high success rates in COTS DRAM chips.

COTS DRAM chips can perform AND, NAND, OR, and NOR operations with 2, 4, 8, and 16 input operands with a high success rate. For example, we can perform 16-input AND, NAND, OR, and NOR operations in COTS DRAM chips with average success rates of 94.94%, 94.94%, 95.85%, and 95.87%, respectively.

**Observation 11.** The success rate of bitwise operations consistently increases as the number of input operands increases.

For instance, 16-input AND operations achieve 10.27% higher average success rate than 2-input AND operations. This trend is consistent for all tested bitwise operations.

**Observation 12.** OR (NOR) operations have higher success rate than AND (NAND) operations.

For example, 2-input OR (NOR) operations achieve 10.42% (10.60%) higher average success rate than 2-input AND (NAND) operations, and 16-input OR (NOR) operations have 0.96% (0.97%) higher average success rate than 16-input AND (NAND) operations.

**Observation 13.** There is a small average success rate difference between 1) AND - NAND and 2) OR - NOR operations.

For example, the average success rate difference between two-input AND operations and two-input NAND operations is 0.50%, and the average success rate difference between two-input OR operations and two-input NOR operations 0.40%.

**Number of Logic-1s in the Input Operands.** We analyze the impact of the number of logic-1s in the input operands on the success rate of the tested bitwise operations. To understand their impact, we analyze 4-input AND and OR and 16-input AND and OR operations. Fig. 16 shows the success rate of 4-input and 16-input AND (top) and OR (bottom) operations across all tested DRAM cells as we vary the number of logic-1s (x-axis) in their input operands. We make Observation 14 from Fig. 16.



Figure 16: Success rates of AND and OR operations based on the number of logic-1s in the input operands.

**Observation 14.** The success rate of AND and OR operations heavily depends on the number of logic-1s in the input operands. AND operations experience the worst success rate when all the inputs have logic-1 or only one input has logic-0 whereas OR operations experience the worst success rate when only one input or no inputs have logic-1.

For example, for the 16-input (4-input) AND operation, the average success rate decreases by 52.43% (45.43%) when we increase the number of logic-1s in the input operands from zero to fifteen (four). Similarly, for the 16-input (4-input) OR operation, the average success rate decreases by 53.66% (21.46%) when we decrease the number of logic-1s in the input operands from sixteen (four) to one (zero).

We hypothesize that the voltage difference in two terminals of a sense amplifier is too small for the sense amplifier to reliably

amplify the bitline voltage (as sense amplifiers are *not* designed to reliably support simultaneous multiple row activation) when the input operands of AND and OR operations are set to certain values. For example, in a 4-input AND operation, the voltage level on reference subarray's bitlines is 3.5VDD.<sup>10</sup> When all inputs of the AND operation are logic-1, the voltage level on each bitline in the compute subarray is 4VDD and when all inputs of the AND operation are logic-0, the voltage level is GND (or 0VDD). As a result, the voltage difference in the two sense amplifier terminals is significantly smaller in the AND operation with all logic-1 inputs (the difference is 0.5VDD) compared to another AND operation with all logic-0 inputs (the difference is 3.5VDD).

**Distance to the Sense Amplifier.** Fig. 17 shows the average success rates of AND, NAND, OR, and NOR operations (i.e., the mean success rate of every tested DRAM cell for all tested numbers of input operands) using a heatmap plot based on the distance of all simultaneously activated rows in the compute subarray (y-axis) and all simultaneously activated rows in the reference subarray (x-axis) to the sense amplifiers that physically reside between (i.e., shared by) the compute subarray and the reference subarray. We make Observation 15 from Fig. 17.



**Figure 17: Success rates of AND, NAND, OR, and NOR operations based on the distance of activated rows to sense amplifiers.**

**Observation 15.** The success rates of AND, NAND, OR, and NOR operations significantly vary based on the distance of the simultaneously activated rows to the sense amplifiers.

The physical location of activated rows can lead to variations in the average success rate of up to 23.36% for AND, 23.70% for NAND, 10.42% for OR, and 10.50% for NOR operations. We hypothesize that the large variation in the success rate can indicate a strong influence of *design-induced variation* [106]. Design-induced variation causes cell access latency characteristics to vary deterministically based on a cell's physical location in the memory chip (e.g., its distance to the sense amplifier).

**Data Pattern.** We analyze the effect of data pattern on the success rate for each operation as we vary the number of input

operands 2 from to 16. Fig. 18 shows the success rate distribution across all tested DRAM cells for two data patterns: the all 1s/0s data pattern and the random data pattern. We make Observation 16 from Fig. 18.



**Figure 18: Success rates of AND, NAND, OR, and NOR operations with different data patterns.**

**Observation 16.** Data pattern slightly affects the success rate of AND, NAND, OR, and NOR operations.

For example, AND operations with random data patterns have 1.43% lower average success rate than AND operations with all-1s/0s across every tested number of input operands. NAND, OR, and NOR operations also have higher success rates with all-1s/0s than with random data patterns. We hypothesize that the variation in success rate across tested data patterns could be caused by *parasitic coupling capacitance between adjacent bitlines* [69, 84, 115–118]. Prior works [69, 84, 115–118] show the effect of parasitic coupling capacitance between adjacent bitlines: adjacent cells can disturb each other depending on the values stored in them, which can result in a failure during charge sharing or sensing and amplification operations.

**Temperature.** Fig. 19 shows the success rate distribution of AND, NAND, OR, and NOR operations at five different temperature levels: 50°C, 60°C, 70°C, 80°C, and 95°C. The figure clusters boxes into six groups based on the number of input operands. In each cluster of boxes, temperature increases from left to right. We make Observation 17 from Fig. 19.



**Figure 19: Success rates of AND, NAND, OR, and NOR operations at different DRAM chip temperatures.**

**Observation 17.** Temperature has a small effect on the success rate of AND, NAND, OR, and NOR operations in COTS DRAM chips.

When the DRAM chip temperature increases from 50°C to 95°C, the highest variation (across all x-axis values) in the average success rate of AND, NAND, OR, and NOR operations are 1.66%, 1.65%, 1.63%, and 1.64%, respectively.

**Takeaway 4.** COTS DRAM chips can perform up to 16-input AND, NAND, OR, and NOR Boolean operations (at very high success rates). These Boolean operations are highly resilient to temperature changes.

**DRAM Speed Rate.** We analyze the effect of DRAM speed rate on the success rate of AND, NAND, OR, and NOR operations. Fig. 20 shows the success rate distribution of AND, NAND, OR, and NOR operations across DRAM cells where the x-axis shows the number of input operands rows and the hue of a bar shows the DRAM speed rate. In each cluster of boxes, the DRAM speed rate increases from left to right. We make Observation 18 from Fig. 20.



Figure 20: Success rates of AND, NAND, OR, and NOR operations for three DRAM speed rates.

**Observation 18.** The DRAM speed rate significantly affects the success rate of AND, NAND, OR, and NOR operations.

For example, for the 4-input NAND operation, the success rate reduces by 29.89% when the DRAM speed rate increases from 2133 MT/s to 2400 MT/s.

**Chip Density and Die Revision.** We analyze the effect of chip density and die revision on the reliability of many-input AND, NAND, OR, and NOR operations. To do so, we use SK Hynix DRAM modules spanning different chip densities and die revisions: 4Gb A-die, 4-Gb M-die, 8Gb A-die, and 8Gb M-die.<sup>2</sup> Fig. 21 shows the success rate distribution across DRAM cells, with the x-axis representing the module’s chip density and die revision.

**Observation 19.** Chip density and die revision affect the success rate of many-input AND, NAND, OR, and NOR operations.

<sup>12</sup>We note that the tested 8Gb M-die SK Hynix module can perform up to 8-input bitwise operations as we observe that we can simultaneously activate up to 16 rows in neighboring subarrays (i.e., 8:8 activation). We provide every tested DRAM module’s computational capability in the extended version of this paper [101].



Figure 21: Success rates of many-input AND, NAND, OR, and NOR operations for different chip density and die revision combinations.

For example, the average success rate of the two-input AND operation decreases by 27.47% from 4Gb A-die to 4Gb M-die, whereas the two-input AND operation’s average success rate increases by 2.11% from 8Gb A-die to 8Gb M-die.

**Takeaway 5.** The reliability of many-input NAND, NOR, AND, and OR operations significantly varies across DRAM chips with different speed rates, die revisions, and chip densities.

## 7. Limitations of Tested COTS DRAM Chips

We demonstrated that COTS DRAM chips are fundamentally capable of performing functionally-complete Boolean operations using 22 DRAM modules from two major manufacturers, and provided underlying hypotheses of why we observe such computation capability based on the operational principles of modern DRAM chips.

In this section, we discuss two key limitations of COTS DRAM chips in performing functionally-complete Boolean operations.

**Limitation 1. Some COTS DRAM chips do not support all Boolean operations.** While we test COTS DRAM chips from all three major manufacturers (i.e., SK Hynix, Samsung, and Micron), we report major positive results and detailed evaluations in chips from two major manufacturers, SK Hynix and Samsung. In SK Hynix chips, we can *simultaneously* and *sequentially* activate multiple rows in neighboring subarrays, and thus, we can perform *all* the tested bitwise operations. In Samsung chips, we can *only* perform sequential row activation, and thus, we observe only the NOT operation. In Micron chips, we do *not* observe simultaneous or sequential multiple-row activation in neighboring subarrays. Hence, we do not observe any bitwise operations in Micron chips. Prior work [107] proposes a hypothesis that could potentially explain Samsung and Micron chips’ behavior. These existing DRAM chips ignore a DRAM command when the command greatly violates manufacturer-recommended timing parameters, i.e., the DRAM chip acts as if it did not receive the DRAM command. If such limitation was not imposed, we believe these DRAM chips are also fundamentally capable of performing the operations we examine in this work.

**Limitation 2: Tested COTS DRAM chips support up to *only* 16-input Boolean operations.** We simultaneously active up to 48 rows in two neighboring subarrays, enabling us to perform up to 16-input NAND, NOR, AND, and OR operations. It is possible that other untested DRAM chips can simultaneously activate more rows, allowing them to perform more than 16-input bulk bitwise operations. We hypothesize that the number of simultaneously activated rows and the location of the activated rows depend on how the row decoder circuitry of a DRAM chip is designed. We believe that making row decoder circuitry of DRAM chips publicly available could help researchers develop a better understanding of the computational capability of COTS DRAM chips without exhaustive reverse engineering efforts.

We believe and hope that our work inspires future DRAM designs that explicitly and reliably support such operations building on our proof-of-concept demonstration in existing COTS DRAM chips.

## 8. Related Work

To our knowledge, this is the first work that demonstrates 1) functionally-complete Boolean operations and 2) many-input (i.e., more than two-input) NAND, NOR, AND and OR operations in COTS DRAM chips. We discuss related works in three synergistic directions: 1) Processing-using-DRAM (PuD) in COTS DRAM chips, 2) PuD in modified DRAM chips, and 3) simultaneous multiple-row activation in two subarrays.

### 8.1. PuD in COTS DRAM Chips

**Bulk Bitwise Operations [38, 41, 45].** ComputeDRAM [45] activates three rows simultaneously (i.e., triple-row activation [29–31, 48, 50, 51]) in off-the-shelf DDR3 chips to perform three-input majority and two-input AND and OR operations [29–31, 48, 50, 51]. FracDRAM [38] shows that a DRAM cell in COTS DDR3 chips can store fractional values (e.g., VDD/2). FracDRAM uses fractional values to perform MAJ3 operations while simultaneously activating four DRAM rows in the same subarray. DRAM Bender [41] demonstrates two-input AND and OR operations in COTS DDR4 chips. None of these works demonstrate functionally-complete Boolean operations or many-input Boolean operations.

**Bulk Data Copy and Initialization [42, 45].** ComputeDRAM [45] demonstrates bulk data copy operations in DRAM row granularity (i.e., the RowClone [49] operation) in COTS DDR3 chips. PiDRAM [42] provides a flexible end-to-end FPGA-based framework that enables real system studies of PuD techniques, such as RowClone [45, 49].

**Security Primitives [37, 39, 40, 119–126].** Prior works demonstrate DRAM-based true random number generators (TRNGs) [37, 39, 119–121, 125, 126] and physical unclonable functions (PUFs) [40, 120–124] using COTS DRAM chips. We highlight one state-of-the-art DRAM-based TRNG [37] that generates true random numbers in COTS DDR4 chips by simultaneously activating four rows in the same subarray. Our key observation, simultaneously activating multiple rows in *neighboring* subarrays, could also be leveraged to generate true random numbers.

## 8.2. PuD in Modified DRAM Chips

Various prior works [26, 29–31, 33, 35, 36, 43, 44, 46, 48–58, 127, 128] enable bulk operations in DRAM chips by *modifying* the DRAM circuitry. We demonstrate that COTS DRAM chips are capable of performing functionally-complete Boolean operations. We believe truly supporting operations that we demonstrate in DRAM requires proper modifications to DRAM circuitry and standards. Yet, our demonstration shows that existing COTS DRAM chips are already quite capable of computation, and such modifications to DRAM are very promising and likely to be very fruitful.

## 8.3. Simultaneously Activating Two Rows in Two Different Subarrays in COTS DRAM Chips

A prior work (HiRA) [107] demonstrates that real DRAM chips are capable of activating two rows (in quick succession) in electrically isolated (i.e., *not* physically adjacent) subarrays (called *hidden row activation*). This work uses hidden row activation to parallelize a DRAM row’s refresh operation with refresh or activation of other rows in the same bank.

## 9. Conclusion

In this work, we experimentally demonstrate that COTS DRAM chips can perform functionally-complete Boolean operations (i.e., NOT and AND/OR, NAND, and NOR) and many-input (i.e., up to 16-input) NAND, NOR, AND and OR operations. Through an extensive characterization using 256 COTS DDR4 DRAM chips from 22 DRAM modules, we show that COTS DRAM chips can execute NOT and many-input NAND, NOR, AND, and OR operations with high reliability, and data pattern and temperature changes only slightly affect their reliability. We believe that our empirical results demonstrate the potential of using DRAM as a powerful computation substrate. We hope future works build upon our comprehensive study to better characterize, understand, and enhance the computational capability of DRAM chips.

## Acknowledgements

We thank the anonymous reviewers of HPCA 2024 for their encouraging feedback. We thank the SAFARI Research Group members for providing a stimulating intellectual environment. We acknowledge the generous gifts from our industrial partners, including Google, Huawei, Intel, and Microsoft. This work is supported in part by the Semiconductor Research Corporation (SRC), the ETH Future Computing Laboratory (EFCL), and the AI Chip Center for Emerging Smart Systems (ACCESS).

## References

- [1] O. Mutlu, S. Ghose, J. Gómez-Luna, and R. Ausavarungnirun, “Processing Data Where It Makes Sense: Enabling In-Memory Computation,” in *Microprocessors and Microsystems*, 2019.
- [2] O. Mutlu, S. Ghose, J. Gómez-Luna, and R. Ausavarungnirun, “A Modern Primer on Processing in Memory,” in *Emerging Computing: From Devices to Systems — Looking Beyond Moore and Von Neumann*. Springer, 2021. [Online]. Available: <https://arxiv.org/abs/2012.03112>
- [3] O. Mutlu, “Memory Scaling: A Systems Architecture Perspective,” in *IMW*, 2013.
- [4] O. Mutlu and L. Subramanian, “Research Problems and Opportunities in Memory Systems,” *SUPERFI*, 2014.
- [5] J. Dean and L. A. Barroso, “The Tail at Scale,” *CACM*, 2013.

- [6] S. Kanev, J. P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G.-Y. Wei, and D. Brooks, "Profiling a Warehouse-Scale Computer," in *ISCA*, 2015.
- [7] M. Ferdman, A. Adileh, O. Koçberber, S. Volos, M. Alisafaei, D. Jevdic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-Out Workloads on Modern Hardware," in *ASPLOS*, 2012.
- [8] L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, and B. Qiu, "Bigdatabench: A Big Data Benchmark Suite from Internet Services," in *HPCA*, 2014.
- [9] O. Mutlu, S. Ghose, J. Gómez-Luna, and R. Ausavarungnirun, "Enabling Practical Processing in and Near Memory For Data-Intensive Computing," in *DAC*, 2019.
- [10] O. Mutlu, "Intelligent Architectures for Intelligent Machines," in *VLSI-DAT*, 2020.
- [11] S. Ghose, A. Boroumand, J. S. Kim, J. Gómez-Luna, and O. Mutlu, "Processing-in-Memory: A Workload-Driven Perspective," *IBM JRD*, 2019.
- [12] A. Boroumand, S. Ghose, Y. Kim, R. Ausavarungnirun, E. Shiu, R. Thakur, D. Kim, A. Kuusela, A. Knies, P. Ranganathan, and O. Mutlu, "Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks," in *ASPLOS*, 2018.
- [13] A. Boroumand, S. Ghose, B. Akin, R. Narayanaswami, G. F. Oliveira, X. Ma, E. Shiu, and O. Mutlu, "Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks," in *PACT*, 2021.
- [14] S. Wang and E. Ipek, "Reducing Data Movement Energy via Online Data Clustering and Encoding," in *MICRO*, 2016.
- [15] D. Pandian and C.-J. Wu, "Quantifying the Energy Cost of Data Movement for Emerging Smart Phone Workloads on Mobile Platforms," in *IISWC*, 2014.
- [16] S. Koppula, L. Orosa, A. G. Yağlıkçı, R. Azizi, T. Shahroodi, K. Kanellopoulos, and O. Mutlu, "EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM," in *MICRO*, 2019.
- [17] U. Kang, H.-S. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, and J. S. Choi, "Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling," in *The Memory Forum*, 2014.
- [18] S. A. McKee *et al.*, "Reflections on the Memory Wall," *CF*, 2004.
- [19] M. V. Wilkes, "The Memory Gap and the Future of High Performance Memories," *CAN*, 2001.
- [20] Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," in *ISCA*, 2012.
- [21] W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," *CAN*, 1995.
- [22] S. Ghose, T. Li, N. Hajinazar, D. S. Cali, and O. Mutlu, "Demystifying Complex Workload-DRAM Interactions: An Experimental Study," in *SIGMETRICS*, 2020.
- [23] J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing," in *ISCA*, 2015.
- [24] J. Ahn, S. Yoo, O. Mutlu, and K. Choi, "PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture," in *ISCA*, 2015.
- [25] K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler, "Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems," in *ISCA*, 2016.
- [26] Y. Wang, L. Orosa, X. Peng, Y. Guo, S. Ghose, M. Patel, J. S. Kim, J. G. Luna, M. Sadrosadati, N. M. Ghiasi *et al.*, "FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching," in *MICRO*, 2020.
- [27] R. Sites, "It's the Memory, Stupid!" *MPR*, 1996.
- [28] G. F. Oliveira, J. Gómez-Luna, L. Orosa, S. Ghose, N. Vijaykumar, I. Fernandez, M. Sadrosadati, and O. Mutlu, "DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks," *IEEE Access*, 2021.
- [29] V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, "Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology," in *MICRO*, 2017.
- [30] V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, "Fast Bulk Bitwise AND and OR in DRAM," in *CAL*, 2015.
- [31] V. Seshadri and O. Mutlu, "In-DRAM Bulk Bitwise Execution Engine," *arXiv:1905.09822*, 2019.
- [32] N. Hajinazar, G. F. Oliveira, S. Gregorio, J. D. Ferreira, N. M. Ghiasi, M. Patel, M. Alser, S. Ghose, J. Gómez-Luna, and O. Mutlu, "SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM," in *ASPLOS*, 2021.
- [33] S. Angizi and D. Fan, "GraphiDe: A Graph Processing Accelerator Leveraging In-DRAM-Computing," in *GLSVLSI*, 2019.
- [34] F. N. Bostancı, A. Olgun, L. Orosa, A. G. Yaglikci, J. S. Kim, H. Hassan, O. Ergin, and O. Mutlu, "DR-STRaNGe: End-to-End System Design for DRAM-based True Random Number Generators," in *HPCA*, 2022.
- [35] J. D. Ferreira, G. Falcao, J. Gómez-Luna, M. Alser, L. Orosa, M. Sadrosadati, J. S. Kim, G. F. Oliveira, T. Shahroodi, A. Nori *et al.*, "pLUTO: Enabling Massively Parallel Computation in DRAM via Lookup Tables," in *MICRO*, 2022.
- [36] S. Li, A. O. Glova, X. Hu, P. Gu, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie, "SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator," in *MICRO*, 2018.
- [37] A. Olgun, M. Patel, A. G. Yağlıkçı, H. Luo, J. S. Kim, N. Bostancı, N. Vijaykumar, O. Ergin, and O. Mutlu, "QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips," in *ISCA*, 2021.
- [38] F. Gao, G. Tziantzioulis, and D. Wentzlaff, "FracDRAM: Fractional Values in Off-the-Shelf DRAM," in *MICRO*, 2022.
- [39] J. S. Kim, M. Patel, H. Hassan, L. Orosa, and O. Mutlu, "D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput," in *HPCA*, 2019.
- [40] J. S. Kim, M. Patel, H. Hassan, and O. Mutlu, "The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency–Reliability Tradeoff in Modern Commodity DRAM Devices," in *HPCA*, 2018.
- [41] A. Olgun, H. Hassan, A. G. Yağlıkçı, Y. C. Tuğrul, L. Orosa, H. Luo, M. Patel, O. Ergin, and O. Mutlu, "DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips," *TCAD*, 2023.
- [42] A. Olgun, J. G. Luna, K. Kanellopoulos, B. Salami, H. Hassan, O. Ergin, and O. Mutlu, "PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM," *TACO*, 2022.
- [43] M. Besta, R. Kanakagiri, G. Kwasniewski, R. Ausavarungnirun, J. Beránek, K. Kanellopoulos, K. Janda, Z. Vonarbog-Shmarija, L. Gianinazzi, I. Stefan *et al.*, "SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems," in *MICRO*, 2021.
- [44] Q. Deng, L. Jiang, Y. Zhang, M. Zhang, and J. Yang, "DrAcc: A DRAM Based Accelerator for Accurate CNN Inference," in *DAC*, 2018.
- [45] F. Gao, G. Tziantzioulis, and D. Wentzlaff, "ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs," in *MICRO*, 2019.
- [46] S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie, "DRISA: A DRAM-Based Reconfigurable In-Situ Accelerator," in *MICRO*, 2017.
- [47] G. F. Oliveira, J. Gómez-Luna, S. Ghose, A. Boroumand, and O. Mutlu, "Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud," *IEEE Micro*, 2022.
- [48] V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, "Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM," *arXiv:1611.09988 [cs.AR]*, 2016.
- [49] V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. Mowry, "RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization," in *MICRO*, 2013.
- [50] V. Seshadri and O. Mutlu, "The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR," *arXiv:1610.09603 [cs.AR]*, 2016.
- [51] V. Seshadri and O. Mutlu, "Simple Operations in Memory to Reduce Data Movement," in *Advances in Computers, Volume 106*, 2017.
- [52] V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "RowClone: Accelerating Data Movement and Initialization Using DRAM," *arXiv:1805.03502 [cs.AR]*, 2018.
- [53] X. Xin, Y. Zhang, and J. Yang, "ELP2IM: Efficient and Low Power Bitwise Operation Processing in DRAM," in *HPCA*, 2020.
- [54] L. Wu, R. Sharifi, A. Venkat, and K. Skadron, "DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching," in *CAL*, 2022.
- [55] X. Xin, Y. Zhang, and J. Yang, "ROC: DRAM-Based Processing with Reduced Operation Cycles," in *DAC*, 2019.
- [56] Q. Deng, Y. Zhang, M. Zhang, and J. Yang, "LAcc: Exploiting Lookup Table-Based Fast and Accurate Vector Multiplication in DRAM-Based CNN Accelerator," in *DAC*, 2019.
- [57] P. R. Sutradhar, S. Bavikadi, M. Connolly, S. Prajapati, M. A. Indovina, S. M. P. Dinakarao, and A. Ganguly, "Look-Up-Table Based Processing-in-Memory Architecture with Programmable Precision-Scaling for Deep Learning Applications," *TPDS*, 2021.
- [58] P. R. Sutradhar, M. Connolly, S. Bavikadi, S. M. P. Dinakarao, M. A. Indovina, and A. Ganguly, "pPIM: A Programmable Processor-in-Memory Architecture with Precision-Scaling For Deep Learning," in *CAL*, 2020.
- [59] H. Luo, T. Shahroodi, H. Hassan, M. Patel, A. G. Yaglikci, L. Orosa, J. Park, and O. Mutlu, "CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off," in *ISCA*, 2020.
- [60] K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu, "Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM," in *HPCA*, 2016.
- [61] T. Sekiguchi, K. Itoh, T. Takahashi, M. Sugaya, H. Fujisawa, M. Nakamura, K. Kajigaya, and K. Kimura, "A Low-Impedance Open-Bitline Array for Multigigabit DRAM," *JSSC*, 2002.
- [62] B. Jacob, "The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It," *Synthesis Lectures on Computer Architecture*, vol. 4, 2009.
- [63] B. Jacob, S. Ng, and D. Wang, *Memory Systems: Cache, DRAM, Disk*. Morgan Kaufmann, 2008.
- [64] K. Itoh, *VLSI Memory Chip Design*. Springer, 2001.
- [65] B. Keeth *et al.*, *DRAM Circuit Design. Fundamental and High-Speed Topics*. Wiley-IEEE Press, 2007.
- [66] T. Schloesser, F. Jakubowski, J. v. Kluge, A. Graham, S. Slesazeck, M. Popp, P. Baars, K. Muennler, P. Moll, K. Wilson, A. Buerke, D. Koehler, J. Raddecker, E. Erben, U. Zimmermann, T. Vorrath, B. Fischer, G. Aichmayr, R. Agaiby, W. Pamler, T. Schuster, W. Bergner, and W. Mueller, "A 6F2 Buried Wordline DRAM Cell for 40nm and Beyond," in *IEDM*, 2008.
- [67] N. H. Weste and D. Harris, *CMOS VLSI Design: A Circuits and Systems Perspective*. Pearson Education India, 2015.
- [68] D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in *HPCA*, 2013.
- [69] S. Khan, D. Lee, and O. Mutlu, "PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM," in *DSN*, 2016.
- [70] H. Hassan, M. Patel, J. S. Kim, A. G. Yağlıkçı, N. Vijaykumar, N. Mansouri Ghiasi, S. Ghose, and O. Mutlu, "CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability," in *ISCA*, 2019.
- [71] S. Ghose, A. G. Yaglikci, R. Gupta, D. Lee, K. Kudrolli, W. X. Liu, H. Hassan, K. K. Kanellopoulos, and O. Mutlu, "A Novel DRAM Architecture for High-Performance and Low-Power Computing," in *ISCA*, 2020.

- Chang, N. Chatterjee, A. Agrawal *et al.*, "What Your DRAM Power Models Are Not Telling You: Lessons From A Detailed Experimental Study," in *SIGMETRICS*, 2018.
- [72] Y. Kim, W. Yang, and O. Mutlu, "Ramulator: A Fast and Extensible DRAM Simulator," *CAL*, 2016.
- [73] T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie, "Half-DRAM: A High-Bandwidth and Low-Power DRAM Architecture from the Rethinking of Fine-Grained Activation," in *ISCA*, 2014.
- [74] H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, and O. Mutlu, "ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality," in *HPCA*, 2016.
- [75] K. K. Chang, A. G. Yağlıkçı, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, and O. Mutlu, "Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms," in *SIGMETRICS*, 2017.
- [76] K. K. Chang, "Understanding and Improving the Latency of DRAM-Based Memory Systems," Ph.D. dissertation, Carnegie Mellon University, 2017.
- [77] K. K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, and O. Mutlu, "Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization," in *SIGMETRICS*, 2016.
- [78] K. K. Chang, D. Lee, Z. Chishti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu, "Improving DRAM Performance by Parallelizing Refreshes with Accesses," in *HPCA*, 2014.
- [79] K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu, "Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM," in *HPCA*, 2016.
- [80] D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu, "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," in *HPCA*, 2015.
- [81] D. Lee, S. Khan, L. Subramanian, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, S. Ghose, and O. Mutlu, "Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips," in *SIGMETRICS*, 2017.
- [82] D. Lee, "Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity," Ph.D. dissertation, Carnegie Mellon University, 2016.
- [83] D. Lee, L. Subramanian, R. Ausavarungnirun, J. Choi, and O. Mutlu, "Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM," in *PACT*, 2015.
- [84] J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, O. Mutlu, J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu, "An Experimental Study of Data Retention Behavior in Modern DRAM Devices," in *ISCA*, 2013.
- [85] J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," in *ISCA*, 2012.
- [86] V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-Unit Strided Accesses," in *MICRO*, 2015.
- [87] E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana, "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," in *ISCA*, 2008.
- [88] D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu, "Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost," *TACO*, 2016.
- [89] R. H. Dennard, "Field-Effect Transistor Memory," 1968, U.S. Patent 3,387,286.
- [90] B. Keeth, R. J. Baker, B. Johnson, and F. Lin, *DRAM Circuit Design: Fundamental and High-Speed Topics*. John Wiley & Sons, 2007.
- [91] M. Patel, "Enabling Effective Error Mitigation In Memory Chips That Use On-Die Error-Correcting Codes," Ph.D. dissertation, ETH Zürich, 2022.
- [92] H. Hassan, "Improving DRAM Performance, Reliability, and Security by Rigorously Understanding Intrinsic DRAM Operation," Ph.D. dissertation, ETH Zürich, 2022.
- [93] J. M. O'Connor, "Energy Efficient High Bandwidth DRAM for Throughput Processors," Ph.D. dissertation, The University of Texas at Austin, 2021.
- [94] K. Itoh, "Semiconductor Memory," US Patent 4,044,340, Apr 23, 1977.
- [95] SAFARI Research Group, "DRAM Bender — GitHub Repository," <https://github.com/CMU-SAFARI/DRAM-Bender>, 2022.
- [96] Xilinx Inc., "Xilinx Alveo U200 FPGA Board," <https://www.xilinx.com/products/boards-and-kits/alveo/u200.html>.
- [97] Maxwell, "FT20X User Manual," <https://www.maxwell-fa.com/upload/files/base/8/m/311.pdf>.
- [98] H. Luo, A. Olgun, A. G. Yağlıkçı, Y. C. Tuğrul, S. Rhyner, M. B. Cavlak, J. Lindeger, M. Sadrosadati, and O. Mutlu, "RowPress: Amplifying Read Disturbance in Modern DRAM Chips," in *ISCA*, 2023.
- [99] J. S. Kim, M. Patel, A. G. Yağlıkçı, H. Hassan, R. Azizi, L. Orosa, and O. Mutlu, "Revisiting RowHammer: An Experimental Analysis of Modern Devices and Mitigation Techniques," in *ISCA*, 2020.
- [100] L. Orosa, A. G. Yağlıkçı, H. Luo, A. Olgun, J. Park, H. Hassan, M. Patel, J. S. Kim, and O. Mutlu, "A Deeper Look into RowHammer's Sensitivities: Experimental Analysis of Real DRAM Chips and Implications on Future Attacks and Defenses," in *MICRO*, 2021.
- [101] I. E. Yuksel, Y. C. Tugrul, A. Olgun, F. N. Bostancı, A. G. Yaglikci, G. F. Oliveira, H. Luo, J. G. Luna, M. Sadrosadati, and O. Mutlu, "Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis," arXiv, 2024.
- [102] F. Bai, S. Wang, X. Jia, Y. Guo, B. Yu, H. Wang, C. Lai, Q. Ren, and H. Sun, "A Low-Cost Reduced-Latency DRAM Architecture With Dynamic Reconfiguration of Row Decoder," *VLSI*, 2022.
- [103] M. A. Turi and J. G. Delgado-Frias, "High-Performance Low-Power Selective Precharge Schemes for Address Decoders," *TCS*, 2008.
- [104] M. Marazzi, F. Solt, P. Jattke, K. Takashi, and K. Razavi, "REGA: Scalable Rowhammer Mitigation with Refresh-Generating Activations," in *S&P*, 2023.
- [105] I. E. Yuksel, Y. C. Tugrul, F. N. Bostancı, A. G. Yaglikci, A. Olgun, G. F. Oliveira, M. Soysal, H. Luo, J. G. Luna, M. Sadrosadati, and O. Mutlu, "PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips," arXiv:2312.02880, 2023.
- [106] D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, and O. Mutlu, "Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms," in *SIGMETRICS*, 2017.
- [107] A. G. Yağlıkçı, A. Olgun, M. Patel, H. Luo, H. Hassan, L. Orosa, O. Ergin, and O. Mutlu, "Hira: Hidden Row Activation for Reducing Refresh Latency of Off-the-Shelf DRAM Chips," in *MICRO*, 2022.
- [108] A. Olgun, M. Osseiran, A. G. Yağlıkçı, Y. C. Tuğrul, H. Luo, S. Rhyner, B. Salami, J. Gomez Luna, and O. Mutlu, "An Experimental Analysis of RowHammer in HBM2 DRAM Chips," in *DSN*, 2023.
- [109] A. G. Yağlıkçı, H. Luo, G. F. Oliveira, A. Olgun, M. Patel, J. Park, H. Hassan, J. S. Kim, L. Orosa, and O. Mutlu, "Understanding RowHammer Under Reduced Wordline Voltage: An Experimental Study Using Real DRAM Devices," in *DSN*, 2022.
- [110] A. G. Yağlıkçı, G. F. Oliveira, Y. C. Tuğrul, I. E. Yuksel, A. Olgun, H. Luo, and O. Mutlu, "Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions," in *HPCA*, 2024.
- [111] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," in *ISCA*, 2014.
- [112] O. Mutlu and J. Kim, "RowHammer: A Retrospective," *IEEE TCAD Special Issue on Top Picks in Hardware and Embedded Security*, 2019.
- [113] O. Mutlu, A. Olgun, and A. G. Yaglikci, "Fundamentally Understanding and Solving RowHammer," in *ASP-DAC*, 2023.
- [114] L. Orosa, U. Rührmair, A. G. Yaglikci, H. Luo, A. Olgun, P. Jattke, M. Patel, J. Kim, K. Razavi, and O. Mutlu, "SpyHammer: Using RowHammer to Remotely Spy on Temperature," arXiv:2210.04084, 2022.
- [115] Z. Al-Ars, S. Hamdioui, and A. J. van de Goor, "Effects of Bit Line Coupling on the Faulty Behavior of DRAMs," in *VLSI*, 2004.
- [116] M. J. Lee and K. W. Park, "A Mechanism for Dependence of Refresh Time on Data Pattern in DRAM," *Electron Device Letters*, 2010.
- [117] Y. Nakagome, M. Aoki, S. Ikenaga, M. Horiguchi, S. Kimura, Y. Kawamoto, and K. Itoh, "The Impact of Data-line Interference Noise on DRAM Scaling," *JSSC*, 1988.
- [118] M. Redeker, B. F. Cockburn, and D. G. Elliott, "An Investigation into Crosstalk Noise in DRAM Structures," in *MTTDT*, 2002.
- [119] B. B. Talukder, J. Kerns, B. Ray, T. Morris, and M. T. Rahman, "Exploiting DRAM Latency Variations for Generating True Random Numbers," in *ICCE*, 2019.
- [120] C. Keller, F. Gurkaynak, H. Kaeslin, and N. Felber, "Dynamic Memory-based Physically Unclonable Function for the Generation of Unique Identifiers and True Random Numbers," in *ISCAS*, 2014.
- [121] S. Sutar, A. Raha, and V. Raghunathan, "D-PUF: An Intrinsically Reconfigurable DRAM PUF for Device Authentication in Embedded Systems," in *CASES*, 2016.
- [122] W. Xiong, A. Schaller, N. A. Anagnostopoulos, M. U. Saleem, S. Gabmeyer, S. Katzenbeisser, and J. Szefer, "Run-time Accessible DRAM PUFs in Commodity Devices," in *CHES*, 2016.
- [123] M. S. Hashemian, B. Singh, F. Wolff, D. Weyer, S. Clay, and C. Papachristou, "A Robust Authentication Methodology using Physically Unclonable Functions in DRAM Arrays," in *DATE*, 2015.
- [124] F. Tehranipoor, N. Karimian, W. Yan, and J. A. Chandy, "DRAM-based Intrinsic Physically Unclonable Functions for System-Level Security and Authentication," *VLSI*, 2016.
- [125] C. Eckert, F. Tehranipoor, and J. A. Chandy, "DRNG: DRAM-based Random Number Generation using Its Startup Value Behavior," in *MWSCAS*, 2017.
- [126] F. Tehranipoor, W. Yan, and J. A. Chandy, "Robust Hardware True Random Number Generators using DRAM Remanence Effects," in *HOST*, 2016.
- [127] R. Zhou, A. Roohi, D. Misra, and S. Angizi, "FlexiDRAM: A Flexible In-DRAM Framework to Enable Parallel General-Purpose Computation," in *ISLPED*, 2022.
- [128] G. F. Oliveira, A. Olgun, A. G. G. Yağlıkçı, N. Bostancı, J. Gómez-Luna, S. Ghose, and O. Mutlu, "MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing," *HPCA*, 2024.