

input size → [ ]

size

## Computing (In-Memory) Architecture

2019

07/01/25



walls



### Power Wall

• Static power

standby mode (power consumption)

• dynamic power -

$$P = CN^2f$$

when will be in function

No. of

MOS

Scaling

High Dielectric Constant



2. Memory Wall | 3. (ILP) Wall

4. Technology Wall

5. Reliability Wall (Economically Not Ready)

6. Cost Wall

3 problems parallel



Paging

Only required block will be placed in memory







### Requirements:

1. Memory & Memory Controller (Compute Capable)
2. Design Processor chip with In Memory
3. Design of system and hardware interfaces.
4. Fixing of system software and languages.
5. Developing algorithms for CDM.
6. New Memory Architecture.



→ MIP → MIP

MAR → MAR

ABW → ABW

② Slave Co-operating

Copy → Bulk Copy

DMA

DMA Controller

(Processor, Interface)



## • Bulk DMA



## Cache Pollution

Cache is polluted due to bulk copy.

→ Implement bulk copy in-memory.



## Bulk Copy

### • Row Cloning



PIM

# CIM Model



- On-Memory Computing Hardware Accelerator for Data Intensive Application - Hachnani.



NVMes ←  
Modularity lost  
no overhead due to restoration.



Memistor. Acker's Logic (1970)



MIS

## In-Memory Array (IM-A)



(and switch) with bit of each word  
will be on  
same bit line

Peripherata transl.

~~private~~ Suitable for Analogs

## In Memory Computing in DRAM

The diagram illustrates a DRAM subarray structure. It features a central vertical column labeled "Transistor". To the left of this column, the word "Access" is written above a horizontal line labeled "WT". A second horizontal line, also labeled "WT", extends from the right side of the central column. The entire structure is enclosed within a rectangular border.





# Triple-Row Activation



bitline deviation at the end of charge sharing

$$8 = \frac{R C_c \cdot V_{DD} + C_b \cdot \frac{1}{2} V_{DD}}{3 C_c + C_b} - \frac{1}{2} V_{DD}$$

$$= \frac{(2R - 3) C_c}{6 C_c + 2 C_b} V_{DD}$$

$R \leftarrow$  Majority gate

$$C = 1 \rightarrow \underline{\underline{A + B}}$$

NOT

• Dual-Contact Cell



Row Clone Stop Bulk Copy and Initialization

RDM

$(STA) < 1 = 0$   
Pipeline

## Fast Parallel Mode

(src  $\xrightarrow{\text{bitline}} \text{dest}$ )

src  $\rightarrow$  buffer  $\rightarrow$  dest

almost (SA+bitline)

[src and dest to will be in  
same subarray]

## Pipeline Serial Mode (Row clone)

(PSM)

random num generation  $\leftarrow$  DRAM Computation

Pseudo Random Numbers

following DRAM Computation.

## MC Applications

## Pipeline Serial Mode (PSM)

16/01/25



a) Normal Mode



b) Rowclone

## Bulk Data Copy

- \* src & dest fall in different subarray but same bank
- \* Row mode used



- Bulk Initialization
- Bulk Zerocuring

as (violate)  
• ROC

• XOR & XNOR

In-Memory Computing SRAM

$$C = A \oplus B$$



Common Applications of DRAM & SRAM

(In Memory) And differences

## 6. ROC (Reduced Operating Cycles)



Only 1 can be written in C or D  
Only 0 can be written in D.



@OR operation

$$Y = A \cdot B$$

$$\underline{A\bar{B}} + \bar{A}B$$



(b) AND operation

$$Y = \underline{\overline{A \cdot B}} \\ = \overline{A + B}$$

$$\overline{\overline{A \cdot B}} \\ = \overline{\overline{A} \cdot \overline{B}}$$

XNGR

MARY ~~Richards~~ <sup>16</sup>

四

24

CAM

## In Memory Computing in SRAM



4 for storing data

## Access transistors

## CMOS Inverters



## Cross Coupled Inverters



pull up  $\rightarrow$  bitline prechargers, write driver

transistor and diode

written on ~~water~~ the cell will be

- What is available  written on bitline.



# Digital Computing



(X) 5



Single Ended SA Differential

rover

NOR

Bitline

1 1 0  
0 1

3-ip NAND  
NOR

## SRAM-CAM

$$wLi[i] = 0$$

$$wLj[j] = 1$$

$$wLk[k] =$$

## Search Operation

### Comparators



Area of Peripherals

Match bit

Match Register



Match Register



wLo*i*



Key(A)

Mask(M)

23/01/26



Search

Address

Write

Read

Reset

Write

Read

Address

Write

## 4x4 array of SRAM

| <u>Search Key</u> | <u>Memory Word</u> |
|-------------------|--------------------|
| 1                 | 0110               |
| 0                 | 0101               |
| 1                 | <u>00</u>          |
| 1                 | <u>00</u>          |

## Bit-Serial Arithmetic for Integers



1. Two operands mapped to same bit line  
→ Ripple Carry Adder.

(Peripheral logic)

① Read word lines (RWL)

② 2nd half cycle write is activated to store back  $(n+1)$  cycles



Array Peripherals

## SRAM Peripherals

MAP to Young - EXP



SRAM peripheral



0110

1010

0111

0111

MAC



- Sum ( $A \oplus B \oplus C_{in}$ )
- Carry ( $(A \& B) + (A \oplus B \oplus C_{in})$ )
- Bn-SRAM Multiplication



- ① Tag is an enable signal to bitline driver.
- ② When tag is 1, sum is written back to product array.

If tag is 1, multiplicand in the bitline is copied  
 $m^2 + 5n - 2$  cycles for n bits.

## Reduction





SRAM

Memristor

### Memristor

- MA. GIC
- IMPLY

### Hybrid CMOS

RRM

RRM

(SRAM + RRM) = (M) Carry Lookahead Adder

Akers Logic

Memristor based 2x2 binary multiplier

- DRAM

- N-Input MRL

- SRAM

OCA



across 8

HVR

MRAM

spiral bitcells - writing

- reading - MRAM

## Memristors

NVMe

[3.01.25]

- two terminal device
- Memory + Resistor

I, q, V,  $\phi$

③-terminal  
memristor  
 $TiO_2$

$TiO_2-\alpha$

$$d\phi = M \cdot dq$$

$$\text{Memristance } (M) = \left( \log \frac{d\phi}{dt} \right)_{q=0} \cdot \left( \frac{dq}{dt} \right)_{\phi=0} = \frac{v}{i}$$

Memristive device

NRAM

$R_{off} \gg R_{on}$

doped undoped



$R_{off}$  will be

if

$V_{DD} > V_{Threshold}$

logic design style

8 classes

- RVH
- RRM → Smider, Stateful logic
- MRM

## Stateful logic design

- IMPLY (Memory based material Implication)

- MAGIC (Memory aided logic)

- Memristive Array

Batched program  
MAGIC NOR

fast (MSB) → In1

V<sub>G</sub>



reverse bias high



$$0 \cdot 0 = 1$$

$$1 \cdot 1 = 0$$

Memristive Crossbar Structure

NAND



↓ high Resistance

final M

- N-Input NOR | Integration with CMOS Inverters

AND/OR



- CMOS NAND Gate

Memristor-based Gates

NOT

NOR

AND &  
OR

Implementation

$$\{FPM, RDM\} = \left\{ \frac{V_{DD} \times R_{ON}}{R_{ON} + R_{OFF}} \right\}$$

$V_{OUT} =$

$\begin{cases} \overline{VMM} \\ MVM \end{cases}$  Vector Matrix

$$Ax = b$$

$$x = A^{-1}b$$

In-Memory Matrix Multiplication



$$A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} = \underline{\quad}$$

$\alpha \leftarrow$  In terms of voltage

$\downarrow$   
mapped linearly to amplitudes or duration of read voltage.  
result  $B$  of computation resulting current measured along columns of the array.

Current is summed across each column.

$$i_j = \sum_{i=0}^n \frac{v_{ri}}{R(i,j)} = \sum_{i=0}^n v_{ri} C(i,j)$$

$v_{ri}$   $R(i,j)$   $C(i,j)$

Cell resistance      Conductance

$$I = V_I \times C$$

One read cycle is required.

In-Memory Sorting

Compare & Swap — Batcher  
(Bitonic)



Cyclic Rotation

$$a_0 \leq a_1 \leq a_2 \leq a_{n/2-1} \leq a_{n/2} \geq a_{n/2+1} \geq \dots$$

Bitonic Sort

$S_1 \rightarrow$  Ascending

$S_2 \rightarrow$  Descending

$$S_1 = \left\{ \min \{ a_0, a_{n/2} \}, \min \{ a_1, a_{n/2+1} \} \right\}$$

$$S_2 = \left\{ \min \{ a_0, a_n \}, \dots \right\}$$

$$\textcircled{1} \textcircled{5}, \textcircled{2}, \textcircled{9}, \textcircled{11}, \textcircled{13}, \textcircled{16}, \textcircled{8}, \textcircled{4}, \textcircled{2}, \textcircled{10} \quad \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = A$$

$$S_1 = \{1, 5, 7, 4, 2\} \xrightarrow{\text{splitter} \rightarrow \text{swap}} \text{alt} \rightarrow \text{alt}$$

$S_2 = \{13, 10, 8, 9, 11\}$   $\rightarrow$  ~~merging~~  $\rightarrow$  ~~parallel~~ bitonic sequence

Comparing & Swapping  $\rightarrow$  ~~with selection~~  $\rightarrow$  A swap  $\rightarrow$  ~~versus all~~  $\rightarrow$  bitonic sequence generation

$$\text{number of } n = \left\{ \frac{n}{2}, \frac{n}{4}, \dots \right\}$$

$$\text{ex. } (13) \text{ D 10 } \xrightarrow{\text{splitter}} \frac{n}{2} = 6 \xrightarrow{\text{swap}} (13) \text{ 10 } \xrightarrow{\text{merge}} = 0$$

$$Y = 19, 2, 72, 3, 18, 57, 600, 100$$

②  $\left\{ \begin{array}{l} \text{left} \\ \text{key} \end{array} \right.$

$$DX_2^T = I$$

[3]  
3d  
2st

2st



Sorting in  $\leftarrow$  Memristive Memory

