

# Dora

Processor Expressiveness  
Comes (Nearly) Free in  
Zero-Knowledge for  
RAM Programs



Aarushi Goel



Mathias Hall-Andersen



Gabriel Kaptchuk



# Zero-Knowledge Proofs [GMR85]



# Zero-Knowledge Proofs [GMR85]





# ZK for RAM Programs: Two Approaches

Transform source code representation of a RAM program directly into a circuit and then use a circuit-based zero-knowledge proof

```
1: static const char* SMALL_BOARD = "small.board.v11";
2: int* alloc_resources(const char* board_type) {
3:     int block_size;
4:     // The next line has a bug!!
5:     if (!strcmp(board_type, SMALL_BOARD, sizeof(SMALL_BOARD))) {
6:         block_size = 10;
7:     } else { block_size = 100; }
8:     return malloc(block_size * sizeof(int));
9: }
10: int incr_clock(const char* board_type, int* resources) {
11:     if (!strcmp(board_type, SMALL_BOARD, strlen(SMALL_BOARD))) {
12:         clock_loc = 0;
13:     } else { clock_loc = 64; }
14:     (*resources + clock_loc)++;
15:     return resources[clock_loc];
16: }
17: void snippet(const char* board_type) {
18:     int* res = alloc_resources(board_type);
    incr_clock(board_type, res);
}
```



Design custom zero-knowledge proofs to emulate RAM computations (eg. [BCG+13])

## TinyRAM Architecture Specification v2.000

Eli Ben-Sasson

StarkWare

Alessandro Chiesa

UC Berkeley

Daniel Genkin

University of Michigan

Eran Tromer

Tel Aviv University and Columbia University

Madars Virza

MIT

[eli@starkware.co](mailto:eli@starkware.co)

[alexch@berkeley.edu](mailto:alexch@berkeley.edu)

[genkin@umich.edu](mailto:genkin@umich.edu)

[tromer@cs.tau.ac.il](mailto:tromer@cs.tau.ac.il)

[madars@mit.edu](mailto:madars@mit.edu)

SCIPR Lab  
[scipr-lab.org](http://scipr-lab.org)

March 30, 2020

# Emulating RAM (in more detail)



Contains multiple instructions, one  
only of which is executed at each step

# Emulating RAM (in more detail)



## Need to Prove

Correctness of state update at each execution step

Computation depends on the size of all instructions  
[DIO21, YSWW21, WYYXW22, BMRS21]

Consistency of memory reads and writes

(Amortized) constant overhead per access. Large constants!  
[DdSGOTV22]

# Processor Expressiveness Trade-off

## Option 1: Minimal Instruction Set Architecture

### TinyRAM Architecture Specification v2.000

Eli Ben-Sasson   Alessandro Chiesa   Daniel Genkin   Eran Tromer   Madars Virza  
StarkWare   UC Berkeley   University of Michigan   Tel Aviv University and Columbia University   MIT  
[eli@starkware.co](mailto:eli@starkware.co)   [aalexch@berkeley.edu](mailto:aalexch@berkeley.edu)   [genkin@umich.edu](mailto:genkin@umich.edu)   [tromer@cs.tau.ac.il](mailto:tromer@cs.tau.ac.il)   [madars@mit.edu](mailto:madars@mit.edu)

SCIPR Lab  
[scipr-lab.org](http://scipr-lab.org)

March 30, 2020



Processor Circuit



Very few “wasted” gates



Longer trace to emulate missing instructions

## Option 2: Expressive Instruction Set Architecture



### Intel® 64 and IA-32 Architectures Software Developer's Manual

Combined Volumes:  
1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4

NOTE: This document contains all four volumes of the Intel 64 and IA-32 Architectures Software Developer's Manual: Basic Architecture, Order Number 253665; Instruction Set Reference A-Z, Order Number 325363; System Programming Guide, Order Number 325384; Model-Specific Registers, Order Number 335592. Refer to all four volumes when evaluating your design needs.



Reason over “real” trace, no instruction emulation



Lots of “wasted” gates

Is there a way out of this expressiveness trade-off?

# Dora (Our Result)



ZK for RAM Programs, where the **communication** and **computation** complexity of proving each step of an execution is **independent of the total number of supported instructions**.

$$\text{Complexity} = O(\#instructions + \#steps \times \text{size of one instruction})$$

Zero-Knowledge Bag (**ZKBag**) using **linearly homomorphic commitments**

Checking Memory Consistency using **ZKBag**

Checking Processor Execution using **ZKBag** and a **Folding Scheme**

Dora = Memory ZKBag + Processor ZKBag

Concurrent Work: [YHHKV23] and [YH24]

# Zero-Knowledge Bag



(Amortized) constant-time insertions and removals

# Zero-Knowledge Bag (Properties)

## Unique Removal



Elements can only be removed once!

## Ordered Binding



Only previously inserted elements can be removed!

# Zero-Knowledge Bag (Construction)

Insertion



Prover

$$com_1 = Com(val_1; op_1)$$

$$com_1$$



Verifier

$$tag_1 \leftarrow \mathbb{F}$$

$$tcom_1 = Com(tag_1; top_1)$$

$$tag_1$$

$$tcom_1 = Com(tag_1; top_1)$$

Removal

$$com_2 = Com(val_1; op_2)$$

$$tcom_2 = Com(tag_1; top_2)$$

$$com_2, tcom_2$$

At the End

Check that the bag is empty:

Demonstrate that insertion  
commitments are a permutation of  
removal commitments [Nef01]



# Checking Memory Consistency using ZKBag

Memory Initialization



Read and Write



Read Only



# Processor Execution using ZKBag

*Set of Instructions:*  $I = (Instr_1, \dots, Instr_n)$

We want to verify the following at every execution step:



**Strawman Approach:** Use ZKBag as a disjunction proof.

Then prove  $(state_i \xrightarrow{Instr_j} state_{i+1})$  using any ZK proof.



Need to prove state update over committed instruction

**Our Idea:** Use Nova [KST22] Folding Scheme!!

# Recap: Nova Folding Scheme [KST22]

R1CS

$$(\mathbf{A} \cdot \vec{z}) \circ (\mathbf{B} \cdot \vec{z}) = (\mathbf{C} \cdot \vec{z}), \text{ where } \vec{z} = (\vec{w} \parallel \vec{x} \parallel 1)$$

Extended R1CS

$$(\mathbf{A} \cdot \vec{z}) \circ (\mathbf{B} \cdot \vec{z}) = \mathbf{u} \cdot (\mathbf{C} \cdot \vec{z}) + \vec{e}, \text{ where } \vec{z} = (\vec{w} \parallel \vec{x} \parallel \mathbf{u})$$

## Folding Relaxed R1CS

Given  $\mathbf{A}, \mathbf{B}, \mathbf{C}, (\vec{z}_1, \vec{e}_1), (\vec{z}_2, \vec{e}_2)$  and a random  $r$

We can compute  $(\vec{\vec{z}}, \vec{\vec{e}})$ , such that

$$(\mathbf{A} \cdot \vec{z}) \circ (\mathbf{B} \cdot \vec{z}) = \mathbf{u} \cdot (\mathbf{C} \cdot \vec{z}) + \vec{e} \text{ if and only if}$$

$$(\mathbf{A} \cdot \vec{z}_1) \circ (\mathbf{B} \cdot \vec{z}_1) = u_1 \cdot (\mathbf{C} \cdot \vec{z}_1) + \vec{e}_1 \text{ and } (\mathbf{A} \cdot \vec{z}_2) \circ (\mathbf{B} \cdot \vec{z}_2) = u_2 \cdot (\mathbf{C} \cdot \vec{z}_2) + \vec{e}_2$$

# Processor Execution (Our Approach)

## Initialization



Initialize a relaxed R1CS accumulator for each instruction ( $Acc_1, \dots, Acc_n$ ).

## $i^{th}$ Execution Step: $state_i \xrightarrow{Instr_j} state_{i+1}$



Compute the extended witness to prove  $(state_i \xrightarrow{Instr_j} state_{i+1})$  and fold it onto  $Acc_j$  to compute updated accumulator  $Acc'_j$

## Verification



Retrieve all accumulators, and check if they verify

# Putting **Dora** Together



Memory Bag



Processor Bag

Initialize: Initialize Memory Bag and Processor Bag

Processor Tick:

- Read opcode from memory, Read/Write relevant memory address for op
- Fetch current instruction from processor bag and fold

Cleanup/Finish:

- Remove the current state of memory (opening as appropriate)
- Remove all instructions, randomize, and prove
- Verify bags are empty (and executed honestly)

# Benchmarks (2020 Laptop, Intel i7@ 2.30GHz)

| Mul. Per Clause | $2^3$   | $2^6$   | $2^9$   | $2^{12}$ | $2^{15}$ |
|-----------------|---------|---------|---------|----------|----------|
| $2^6$           | 1138.10 | 1090.46 | 1166.00 | 1123.02  | 663.91   |
| $2^9$           | 423.42  | 418.81  | 417.60  | 376.82   | 240.04   |
| $2^{12}$        | 144.03  | 140.27  | 147.69  | 133.81   | 86.00    |
| $2^{15}$        | 58.60   | 59.23   | 56.77   | 54.47    |          |

← No. of Instructions

No. of processor steps Dora proves per second averaged over 50,000 repetitions

Network Latency

| $t = 2^{23}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
|--------------|----------|----------|----------|----------|----------|
| 0 ms         | 545387   | 545352   | 545210   | 547773   | 543268   |
| 10 ms        | 501591   | 513535   | 508092   | 517432   | 514070   |
| 100 ms       | 202618   | 204475   | 206942   | 200344   | 192602   |

← Memory Space

No. of memory operations Dora can check per second averaged over  $2^{23}$  operations

The code was written in Rust by Mathias and is compliant with the DARPA SIEVE specs

# Concurrent Works

## Batchman and Robin [YHHKV23]

Batchman and Robin:  
Batched and Non-batched Branching for Interactive ZK  
Yibin Yang\* David Heath† Carmit Hazay‡ Vladimir Kolesnikov§  
Muthuramakrishnan Venkitasubramaniam¶  
August 20, 2023

### Abstract

Vector Oblivious Linear Evaluation (VOLE) supports fast and scalable interactive Zero-Knowledge (ZK) proofs. Despite recent improvements to VOLE-based ZK, compiling proof statements to a control-flow oblivious form (e.g., a circuit) continues to lead to expensive proofs. One useful setting where this inefficiency stands out is when the statement is a disjunction of clauses  $\mathcal{L}_1 \vee \dots \vee \mathcal{L}_B$ . Typically, ZK requires paying the price to handle all  $B$  branches. Prior works have shown how to avoid this price in communication, but not in computation.

Our main result, **Batchman**, is asymptotically and concretely efficient VOLE-based ZK for batched disjunctions, i.e. statements containing  $R$  repetitions of the same disjunction. This is crucial for, e.g., emulating CPU steps in ZK. Our prover and verifier complexity is only  $\mathcal{O}(RB + R|\mathcal{C}| + B|\mathcal{C}|)$ , where  $|\mathcal{C}|$  is the maximum circuit size of the  $B$  branches. Prior works'

For Batched Disjunction

## Two Shuffles Make a RAM [YH24]

Two Shuffles Make a RAM:  
Improved Constant Overhead Zero Knowledge RAM  
Yibin Yang\* David Heath†  
November 3, 2023

### Abstract

We optimize Zero Knowledge (ZK) proofs of statements expressed as RAM programs over arithmetic values. Our arithmetic-circuit-based read/write memory uses only 4 input gates and 6 multiplication gates per memory access. This is an almost  $3\times$  total gate improvement over prior state of the art (Delpech de Saint Guilhem et al., SCN'22).

We implemented our memory in the context of ZK proofs based on vector oblivious linear evaluation (VOLE), and we further optimized based on techniques available in the VOLE setting. Our experiments show that (1) our total runtime improves over that of the prior best VOLE-ZK RAM (Franzese et al., CCS'21) by  $2\text{-}20\times$  and (2) on a typical hardware setup, we can achieve  $\approx 600K$  RAM accesses per second.

We also develop improved read-only memory and set ZK data structures. These are used internally in our read/write memory and improve over prior work.

For Checking Consistency of Memory Accesses

Can be combined to obtain a Zero-knowledge proof system for RAM Programs

Similar asymptotic complexities to Dora. Comparison between concrete efficiency is unclear

# Future Directions

Working with **Trail of Bits** team to benchmark x86 circuits on real traces

Understanding the concrete efficiency differences resulting from techniques used in Dora and [YHHKV23, YH24]

Can some combination of these different techniques yield better performance?

**Thanks!**