

# *ScRRAMBL $e$*

## Block Sparse Neural Network Architecture for Analog Compute-in Memory Accelerators

Vikrant Jaltare

Department of Bioengineering &  
Institute for Neural Computation  
UC San Diego

10<sup>th</sup> International Conference on Rebooting Computing 2025

# Case for RRAM\* based Compute-in Memory (CIM) Technologies

\*valid for conductance-based memristive technologies

Why a decades old architecture decision is impeding the power of AI computing

Most computers are based on the von Neumann architecture, which separates compute and memory. This has been perfect for conventional computing, but it's a traffic jam in AI computing.

IBM (2025)

Instead of re-engineering the von Neumann architecture, improvements in parallel processing and memory access have led to a new concept, in-memory computation. In-memory computation is based on the concept of Neuromorphic architecture by Carver Mead, where the data are located in the memory cells. This approach is similar to the computing scheme in the human brain, where information is processed in sparse networks of neurons and synapses, without any physical separation between computation and memory.<sup>15</sup> In-memory computation is a promising technology for AI computing.

Lelmini and Wong, *Nature Electronics* (2018)

## 1.1 Computing's Energy Problem (and what we can do about it)

Mark Horowitz

## 8. Conclusion

In summary, our challenge is clear: The drive for performance and the end of voltage scaling have made power, and not the number of transistors, the primary bottleneck in computing performance. To address this challenge, we will require the creation and effective use of new materials and will require the participation of industry partners. By playing our cards right, and develop the right part of the design process, we will be able to continue to push the boundaries of computing devices.

## Neuromorphic Computing Market Size, Growth and Forecast

The neuromorphic computing market was valued at USD 28.5 million in 2024 and is projected to grow from USD 47.8 million in 2025 to USD 1,325.2 million by 2030, at a CAGR of 89.7% during the forecast period. This remarkable growth is primarily driven by the escalating demand for AI-based applications that mimic the brain's neural architecture, the increasing integration of neuromorphic computing in autonomous vehicles, robotics, and edge computing devices, and the convergence of quantum computing with neuromorphic systems. Leading industry players such as Intel, IBM, and BrainChip are pioneering advancements in neuromorphic processors, further propelling market expansion.

# RRAM Crossbar Arrays for Neural Networks



**Matrix Vector  
Multiplication  
(MVM)**

$$\rightarrow I_j = \sum_i G_{ij} \Delta V_i$$

**Ohm's Law**

## Neural Network Requirements

Efficient MVM ( $\mathcal{O}(n^2)$  or less)

Large number of parameters

Parameter updates → learning rules  
→ outer product

Mid to low precision weights

## CIM Characteristics

Fast MVM ( $\mathcal{O}(1)$ )

High Density (but area constraint)

Good match! But...

Non-volatile storage →  
incremental outer product  
updates

Analog multi-level storage



# Challenges with using RRAM Accelerators

## Scaling



## Signed Weights



$$g_{ij} \in [g_{min}, g_{max}]$$

Positive and limited dynamic range

## Sparsity



Dropout  $\rightarrow$  fine-grained sparsity



Differential encoding?



> 1 memory element per weight.

Mismatch issues

Requires off-chip digital controller  $\rightarrow$  Latency

# Challenges with using RRAM Accelerators

## Scaling



## Signed Weights



$$g_{ij} \in [g_{min}, g_{max}]$$

Positive and limited

## Sparsity



Dropout → fine-grained sparsity

Requires off-chip digital controller → Latency

ScRRAMBLe → Framework to leverage these constraints as features for neural network design?

weight.

Mismatch issues



# Bird's eye view of ScRAMBLe



# Signed Weights with Input Balancing



Offset canceling

$$y = \sum_j G_{ij} \tilde{x}_j$$

signed neural  
net weights

$$y = \sum_j aW_{ij} \tilde{x}_j$$

If  $a, b \in \mathbb{R}$ ,  $W_{ij}$  is signed  
and,

$$G_{ij} = a(W_{ij} + b)$$

$$\therefore ab \sum_j \tilde{x}_j = 0$$

# Signed Weights with Input Balancing



Offset canceling

$$y = \sum_j G_{ij} \tilde{x}_j$$

signed neural net weights

$$y = \sum_j aW_{ij} \tilde{x}_j$$

If  $a, b \in \mathbb{R}$ ,  $W_{ij}$  is signed and,

$$G_{ij} = a(W_{ij} + b)$$

$$\therefore ab \sum_j \tilde{x}_j = 0$$

# Implementing Input-Balanced Routing



# ScRAMBLE Architecture



# Performance of block-sparse networks



# Performance of block-sparse networks



$\eta$  is constant over network size  $\Rightarrow$  some form of small-worldness

# Overcoming Communication Bottlenecks



## Assumptions

1. Packet-switched communication
2. Feedforward block-sparse layer
3. Address bits ( $A$ )  $\propto \log_2 (\#\text{chunks} \times \text{connection density})$
4. Data bits ( $D$ )  $\propto \text{chunk size}$

## Communication Overhead ( $O$ )

$$O(A, D) = \frac{A}{A + D}$$

# Overcoming Communication Bottlenecks



- Assumptions
1. Packet-switched communication
  2. Feedforward block-sparse layer
  3. Address bits ( $A$ )  $\propto \log_2 (\# \text{chunks} \times \text{connection density})$
  4. Data bits ( $D$ )  $\propto \text{chunk size}$
- Communication Overhead ( $O$ )
- $$O(A, D) = \frac{A}{A + D}$$

\*Frankle, J. & Carbin, M. arXiv (2018).

# Quantized ScRAMBLe Networks

## Post Training Quantization



- Full precision training
- Quantize ReLU-activation post training
- Error rate returns to near full-precision with only about 4-bits<sup>†</sup>

## Binary activation



- 1-bit (binary) activation during training
- Straight-through estimator
- Error-rate falls to near full precision baseline

<sup>†</sup>Wan, W. et al. *Nature* **608**, 504–512 (2022).

# Designing CIM accelerators with a new generation of neural networks

Insights for hardware design

1. Structured sparsity → efficient design.
2. Routing can be part of the compute and not just data transfer.
3. Weight sharing is effective in MVM, much like convolutions.

Insights into NN design

1. Block-sparsity as a regularizer.
2. Vector representations can eliminate expensive off-chip digital controllers + enable generative modeling
3. Block-sparse networks as a stand-in for FFNs.

How can we make neural networks ***block-sparse*** for CIM?

# Acknowledgements



## Funding Sources



# Designing CIM accelerators with a new generation of neural networks

Insights for hardware design

1. Structured sparsity → efficient design.
2. Routing can be part of the compute and not just data transfer.
3. Weight sharing is effective in MVM, much like convolutions.

Insights into NN design

1. Block-sparsity as a regularizer.
2. Vector representations can eliminate expensive off-chip digital controllers + enable generative modeling
3. Block-sparse networks as a stand-in for FFNs.

How can we make neural networks ***block-sparse*** for CIM?