

# **Mappings for DRRA**

## **Project course**

**Paul Delestrac - September 18th 2025**

# Objectives

Xina & Joel

Strikersoft 

- ResNet 18 ML model
  - image recognition

| layer name | output size | 18-layer                                                                                       | 34-layer                                                                                       | 50-layer                                                                                                         | 101-layer                                                                                                         | 152-layer                                                                                                         |
|------------|-------------|------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| conv1      | 112×112     |                                                                                                |                                                                                                | 7×7, 64, stride 2                                                                                                |                                                                                                                   |                                                                                                                   |
| conv2_x    | 56×56       | $\left[ \begin{smallmatrix} 3\times3, 64 \\ 3\times3, 64 \end{smallmatrix} \right] \times 2$   | $\left[ \begin{smallmatrix} 3\times3, 64 \\ 3\times3, 64 \end{smallmatrix} \right] \times 3$   | $\left[ \begin{smallmatrix} 1\times1, 64 \\ 3\times3, 64 \\ 1\times1, 256 \end{smallmatrix} \right] \times 3$    | $\left[ \begin{smallmatrix} 1\times1, 64 \\ 3\times3, 64 \\ 1\times1, 256 \end{smallmatrix} \right] \times 3$     | $\left[ \begin{smallmatrix} 1\times1, 64 \\ 3\times3, 64 \\ 1\times1, 256 \end{smallmatrix} \right] \times 3$     |
| conv3_x    | 28×28       | $\left[ \begin{smallmatrix} 3\times3, 128 \\ 3\times3, 128 \end{smallmatrix} \right] \times 2$ | $\left[ \begin{smallmatrix} 3\times3, 128 \\ 3\times3, 128 \end{smallmatrix} \right] \times 4$ | $\left[ \begin{smallmatrix} 1\times1, 128 \\ 3\times3, 128 \\ 1\times1, 512 \end{smallmatrix} \right] \times 4$  | $\left[ \begin{smallmatrix} 1\times1, 128 \\ 3\times3, 128 \\ 1\times1, 512 \end{smallmatrix} \right] \times 4$   | $\left[ \begin{smallmatrix} 1\times1, 128 \\ 3\times3, 128 \\ 1\times1, 512 \end{smallmatrix} \right] \times 8$   |
| conv4_x    | 14×14       | $\left[ \begin{smallmatrix} 3\times3, 256 \\ 3\times3, 256 \end{smallmatrix} \right] \times 2$ | $\left[ \begin{smallmatrix} 3\times3, 256 \\ 3\times3, 256 \end{smallmatrix} \right] \times 6$ | $\left[ \begin{smallmatrix} 1\times1, 256 \\ 3\times3, 256 \\ 1\times1, 1024 \end{smallmatrix} \right] \times 6$ | $\left[ \begin{smallmatrix} 1\times1, 256 \\ 3\times3, 256 \\ 1\times1, 1024 \end{smallmatrix} \right] \times 23$ | $\left[ \begin{smallmatrix} 1\times1, 256 \\ 3\times3, 256 \\ 1\times1, 1024 \end{smallmatrix} \right] \times 36$ |
| conv5_x    | 7×7         | $\left[ \begin{smallmatrix} 3\times3, 512 \\ 3\times3, 512 \end{smallmatrix} \right] \times 2$ | $\left[ \begin{smallmatrix} 3\times3, 512 \\ 3\times3, 512 \end{smallmatrix} \right] \times 3$ | $\left[ \begin{smallmatrix} 1\times1, 512 \\ 3\times3, 512 \\ 1\times1, 2048 \end{smallmatrix} \right] \times 3$ | $\left[ \begin{smallmatrix} 1\times1, 512 \\ 3\times3, 512 \\ 1\times1, 2048 \end{smallmatrix} \right] \times 3$  | $\left[ \begin{smallmatrix} 1\times1, 512 \\ 3\times3, 512 \\ 1\times1, 2048 \end{smallmatrix} \right] \times 3$  |
|            | 1×1         |                                                                                                |                                                                                                | average pool, 1000-d fc, softmax                                                                                 |                                                                                                                   |                                                                                                                   |
| FLOPs      |             | $1.8 \times 10^9$                                                                              | $3.6 \times 10^9$                                                                              | $3.8 \times 10^9$                                                                                                | $7.6 \times 10^9$                                                                                                 | $11.3 \times 10^9$                                                                                                |

ures for ImageNet. Building blocks are shown in brackets (see also Fig. 5), with the numbers of block

Reid & Yurun

AtlasCopco



# **Development flow overview**

# **Development flow overview**

## **1. Establish a reference implementation**

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**
3. For each algorithm:

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**
3. For each algorithm:
  - A. Decide and implement a **memory layout** for IO buffers

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**
3. For each algorithm:
  - A. Decide and implement a **memory layout** for IO buffers
  - B. Implement a C++ model of the algorithm (**model 0**)

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**
3. For each algorithm:
  - A. Decide and implement a **memory layout** for IO buffers
  - B. Implement a C++ model of the algorithm (**model 0**)
  - C. Compose a **DRRA fabric** to run the algorithm  
*(If new hardware is needed, create a new **DRRA component**)*

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**
3. For each algorithm:
  - A. Decide and implement a **memory layout** for IO buffers
  - B. Implement a C++ model of the algorithm (**model 0**)
  - C. Compose a **DRRA fabric** to run the algorithm  
*(If new hardware is needed, create a new **DRRA component**)*
  - D. Write a **PASM implementation** of the algorithm for the fabric

# Development flow overview

1. Establish a **reference implementation**
2. Partition the reference implementation into **algorithms**
3. For each algorithm:
  - A. Decide and implement a **memory layout** for IO buffers
  - B. Implement a C++ model of the algorithm (**model 0**)
  - C. Compose a **DRRA fabric** to run the algorithm  
*(If new hardware is needed, create a new **DRRA component**)*
  - D. Write a **PASM implementation** of the algorithm for the fabric
  - E. Validate the functionality by running the **tests**
    - ✓ **model 0** (C++)
    - ✓ **model 2** (SST)
    - ✓ **model 3** (RTL)

# **1. Reference implementation**

# Why do we need it?

## Reference implementation



# Why do we need it?

## Reference implementation



1. Guide the development of the SiLago design
2. Validate functionality of the SiLago design
3. Compare the latency (and power)

# Why do we need it?

## Reference implementation



1. Guide the development of the SiLago design
2. Validate functionality of the SiLago design
3. Compare the latency (and power)
4. For trainable applications (e.g., ML), SiLago inherits params

# What is a good reference model?

## Reference implementation

- *The reference model is an implementation of the application with **clearly identified algorithms and its inputs/outputs***

# What is a good reference model?

## Reference implementation

- *The reference model is an implementation of the application with **clearly identified algorithms and its inputs/outputs***

```
model = tf.keras.applications.ResNet50()
```



**Not sufficiently detailed**

# What is a good reference model?

## Reference implementation

- *The reference model is an implementation of the application with **clearly identified algorithms and its inputs/outputs***

```
model = tf.keras.applications.ResNet50()
```



**Not sufficiently detailed**

```
model = models.Sequential()
model.add(layers.Conv2D(6, 5, activation='tanh',
                      input_shape=x_train.shape[1:]))
model.add(layers.AveragePooling2D(2))
model.add(layers.Activation('sigmoid'))
model.add(layers.Conv2D(16, 5, activation='tanh'))
model.add(layers.AveragePooling2D(2))
model.add(layers.Activation('sigmoid'))
model.add(layers.Conv2D(120, 5, activation='tanh'))
model.add(layers.Flatten())
model.add(layers.Dense(84, activation='tanh'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()
```



**Good enough**

- *This code can run*
- *Layers are easy to identify*
- *Inputs and outputs can be extracted*

# Data types

## Reference implementation

- Data types (or data formats) directly influences which hardware resources you will choose when mapping
- Currently, DRRA has support for few data types
  - *Check if your application can be implemented with these types*

### Note

You can do fixed-point, but

- Limited dynamic range
- Consider scaling and overflow
- Manage the decimal point

| Data type             | Description                            |
|-----------------------|----------------------------------------|
| bfloat16              | 16-bit bfloat (brain floating point).  |
| bool                  | Boolean.                               |
| complex128            | 128-bit complex.                       |
| complex64             | 64-bit complex.                        |
| <b>float16 (half)</b> | <b>16-bit floating-point.</b>          |
| float32               | 32-bit floating-point.                 |
| float64 (double)      | 64-bit floating-point.                 |
| <b>int16</b>          | <b>Signed 16-bit integer.</b>          |
| int32                 | Signed 32-bit integer.                 |
| int64                 | Signed 64-bit integer.                 |
| int8                  | Signed 8-bit integer.                  |
| qint16                | Signed quantized 16-bit integer.       |
| qint32                | signed quantized 32-bit integer.       |
| qint8                 | Signed quantized 8-bit integer.        |
| quint16               | Unsigned quantized 16-bit int.         |
| quint8                | Unsigned quantized 8-bit int.          |
| <b>uint16</b>         | <b>Unsigned 16-bit (word) integer.</b> |
| uint32                | Unsigned 32-bit (dword) integer.       |
| uint64                | Unsigned 64-bit (qword) integer.       |
| uint8                 | Unsigned 8-bit (byte) integer.         |

*supported*

*supported soon*

not supported yet

*quantized*

# Example: Simple CNN

## Reference implementation



## **2. Application partitioning**

# What is application partitioning?

## Application partitioning



# How to partition the application?

## Application partitioning

There is an **infinite ( $\infty$ )** amount of possible partitioning for a given application

*But here are some considerations...*

- It is easier to **merge two nodes into one** than **divide one node into two**



So don't make your nodes too big

# Example: LeNet-5

## Application partitioning



# Example: LeNet-5

## Application partitioning



# Example: LeNet-5

## Application partitioning



# Example: LeNet-5

## Application partitioning



*Which partitioning option is the best\*? Why?*

# Example: LeNet-5

## Application partitioning



Option 1  
3 nodes



Option 2  
6 nodes



*Which partitioning option is the best\*? Why?*

# **3. Algorithm mapping**

# 3. Algorithm mapping

- A. Memory layout
- B. High-level modeling
- C. Binding
- D. Assembly implementation
- E. Testing

# 3. Algorithm mapping

## Overview



# 3. Algorithm mapping

## Overview



- At this point, **for each algorithm**, you must know

# 3. Algorithm mapping

## Overview



- At this point, **for each algorithm**, you must know
  - The expected behavior
    - e.g., what computation alg1 and alg2 are supposed to do

# 3. Algorithm mapping

## Overview



- At this point, **for each algorithm**, you must know
  - The expected behavior
    - e.g., what computation alg1 and alg2 are supposed to do
  - The shape and data types of inputs/outputs
    - e.g., what are the types and dimensions of data0, data1 and data2

# 3. Algorithm mapping

## Overview



# 3. Algorithm mapping

## Overview

- B. High-level modeling
- C. Binding
- D. PASM implementation
- E. Testing

### A. Memory layout



# A. Memory Layout



# A. Memory Layout



- Decide how to map the input data of the algorithm to the buffer

# A. Memory Layout



- Decide how to map the input data of the algorithm to the buffer
  - consider the data type (bit precision)

# A. Memory Layout



- Decide how to map the input data of the algorithm to the buffer
  - consider the data type (bit precision)
  - do padding if needed

# A. Memory Layout



- Decide how to map the input data of the algorithm to the buffer
  - consider the data type (bit precision)
  - do padding if needed
  - the input buffer is virtual so you can consider it **infinitely long\***

# Example: 32x32 matmul

## A. Memory layout



# Example: 32x32 matmul

## A. Memory layout



- Consider a 32x32 matmul operation (int16)

# Example: 32x32 matmul

## A. Memory layout



- Consider a 32x32 matmul operation (int16)
  - What is the input size?

# Example: 32x32 matmul

## A. Memory layout



- Consider a 32x32 matmul operation (int16)
  - What is the input size?
    - ▶  $2 \times 32 \times 32 = 2048w = 32768b$

# Example: 32x32 matmul

## A. Memory layout



- Consider a 32x32 matmul operation (int16)
  - What is the input size?
    - ▶  $2 \times 32 \times 32 = 2048w = 32768b$
  - What is the size of H?

# Example: 32x32 matmul

## A. Memory layout



- Consider a 32x32 matmul operation (int16)
  - What is the input size?
    - ▶  $2 \times 32 \times 32 = 2048w = 32768b$
  - What is the size of H?
    - ▶  $(2 \times 32 \times 32 \times 16) / (256/16)$   
= 2048 rows

# Example: LeNet-5

## A. Memory layout



# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024W$

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024W$
    - ▶ filter:  $5 \times 5 \times 6 = 150W$

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024W$
    - ▶ filter:  $5 \times 5 \times 6 = 150W$
  - What is the size of H?

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024\text{W}$
    - ▶ filter:  $5 \times 5 \times 6 = 150\text{W}$
  - What is the size of H?
    - ▶ image:  $1024 / 16 = 64$  rows

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024\text{w}$
    - ▶ filter:  $5 \times 5 \times 6 = 150\text{w}$
  - What is the size of H?
    - ▶ image:  $1024 / 16 = 64 \text{ rows}$
    - ▶ kernel:  $150 / 16 = 9.375 \text{ rows?}$

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024\text{w}$
    - ▶ filter:  $5 \times 5 \times 6 = 150\text{w}$
  - What is the size of H?
    - ▶ image:  $1024 / 16 = 64 \text{ rows}$
    - ▶ kernel:  $150 / 16 = 9.375 \text{ rows?}$   
*what should we do? 10 rows?*

# Example: LeNet-5

## A. Memory layout

# Kernels for 1st layer



## Input buffer

A blank 10x10 grid on a white background. The grid consists of 100 equal-sized squares arranged in a single row and column. The horizontal and vertical lines intersect to form the grid. There are no numbers or other markings on the grid.

# Example: LeNet-5

## A. Memory layout



# Example: LeNet-5

## A. Memory layout



*Option 1*



# Example: LeNet-5

## A. Memory layout



*Option 1*



*Option 2*



# Example: LeNet-5

## A. Memory layout



*Option 1*



*Option 2*



*Which layout option is the best? Why?*

# Example: LeNet-5

## A. Memory layout



*Option 1*



*Option 2*



*Which layout option is the best? Why?*

# Example: LeNet-5

## A. Memory layout



# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024W$
    - ▶ filter:  $5 \times 5 \times 6 = 150W$
  - What is the size of H?
    - ▶ image:  $1024 / 16 = 64$  rows
    - ▶ kernel: 12 rows

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024W$
    - ▶ filter:  $5 \times 5 \times 6 = 150W$
  - What is the size of H?
    - ▶ image:  $1024 / 16 = 64$  rows
    - ▶ kernel: 12 rows
  - What is the output size?

# Example: LeNet-5

## A. Memory layout



- First layer of LeNet-5 (CONV2D - int16)
  - What is the input size?
    - ▶ image:  $32 \times 32 \times 1 = 1024W$
    - ▶ filter:  $5 \times 5 \times 6 = 150W$
  - What is the size of H?
    - ▶ image:  $1024/16 = 64$  rows
    - ▶ kernel: 12 rows
  - What is the output size?
    - ▶ output:  $28 \times 28 \times 6 / 16 = 294$  rows

# Write model 0

## High-level modeling



# Write model 0

## High-level modeling



# Write model 0

## High-level modeling



- Write a C++ code that implements the algorithm (model 0)

# Write model 0

## High-level modeling



- Write a C++ code that implements the algorithm (model 0)
- Taking the input memory layout and producing the output

# Write model 0

## High-level modeling



- Write a C++ code that implements the algorithm (model 0)
- Taking the input memory layout and producing the output
- This code does not need to be cycle-accurate

# Write model 0

## High-level modeling



- Write a C++ code that implements the algorithm (model 0)
- Taking the input memory layout and producing the output
- This code does not need to be cycle-accurate
- This code will be a **functional reference** for the DRRA implementation

# Write model 0

## High-level modeling



```
/* Model 0 for 1DConv32 */
void model_l0(IO &input_buffer, IO &output_buffer) {
    // Read the input buffer to A.
    vector<int16_t> a = input_buffer.read<int16_t>(1, 2);
    // Read the input buffer to B.
    vector<int16_t> b = input_buffer.read<int16_t>(0, 1);

    // Add A and B
    vector<int16_t> c(32);
    for (int i = 0; i < 30; i++) {
        int sum = 0;
        for (int j = 0; j < 3; j++) {

            sum += a[i + j] * b[j];
        }
        c[i + 1] = sum;
    }

    // Write the result C to the output buffer
    output_buffer.write<int16_t>(0, 2, c);
}
```

# Write model 0

## High-level modeling



```
/* Model 0 for 1DConv32 */
void model_l0(IO &input_buffer, IO &output_buffer) {
    // Read the input buffer to A.
    vector<int16_t> a = input_buffer.read<int16_t>(1, 2);
    // Read the input buffer to B.
    vector<int16_t> b = input_buffer.read<int16_t>(0, 1);

    // Add A and B
    vector<int16_t> c(32);
    for (int i = 0; i < 30; i++) {
        int sum = 0;
        for (int j = 0; j < 3; j++) {

            sum += a[i + j] * b[j];
        }
        c[i + 1] = sum;
    }

    // Write the result C to the output buffer
    output_buffer.write<int16_t>(0, 2, c);
}
```

# Write model 0

## High-level modeling



# Write model 0

## High-level modeling



# Verification framework

## High-level modeling



# Where to map the algorithm?

## Binding



# Compose a DRRA fabric

## Mapping - Binding

- You can compose a DRRA fabric by choosing resources from the **component library**



# **Basic DRRA fabric**

## **Mapping - Binding**

# **Basic DRRA fabric**

## **Mapping - Binding**

- The default DRRA fabric that we use is a 3 cells 1 column that contains

# Basic DRRA fabric

## Mapping - Binding

- The default DRRA fabric that we use is a 3 cells 1 column that contains
  - Cell0: SRAM to load from input buffer



# Basic DRRA fabric

## Mapping - Binding

- The default DRRA fabric that we use is a 3 cells 1 column that contains
  - Cell0: SRAM to load from input buffer
  - Cell1: 1 DPU to compute and 3 RF to hold values



# Basic DRRA fabric

## Mapping - Binding

- The default DRRA fabric that we use is a 3 cells 1 column that contains
  - Cell0: SRAM to load from input buffer
  - Cell1: 1 DPU to compute and 3 RF to hold values
  - Cell2: SRAM to unload to output buffer



# Basic DRRA fabric

## Mapping - Binding

- The default DRRA fabric that we use is a 3 cells 1 column that contains
  - Cell0: SRAM to load from input buffer
  - Cell1: 1 DPU to compute and 3 RF to hold values
  - Cell2: SRAM to unload to output buffer
- Adding more columns can be used to add degrees of parallelism
  - Example: LeNet-5 first layer had 6 kernels that could be parallelized on 6 columns



# Basic DRRA fabric

## Mapping - Binding

- The default DRRA fabric that we use is a 3 cells 1 column that contains
  - Cell0: SRAM to load from input buffer
  - Cell1: 1 DPU to compute and 3 RF to hold values
  - Cell2: SRAM to unload to output buffer
- Adding more columns can be used to add degrees of parallelism
  - Example: LeNet-5 first layer had 6 kernels that could be parallelized on 6 columns
- Each cell has 16 slots max
  - Note that not all resources are 1 slot (e.g., iosram)



# Compose a fabric

## fabric JSON description

```
{
  "platform": "drra",
  "resources": [
    {
      "name": "swb_impl",
      "kind": "swb"
    },
    {
      "name": "iosram_top_impl",
      "kind": "iosram_top"
    },
    {
      "name": "iosram_btm_impl",
      "kind": "iosram_btm"
    },
    {
      "name": "rf_impl",
      "kind": "rf"
    },
    {
      "name": "dpu_impl",
      "kind": "dpu"
    }
  ],
  "controllers": [
    {
      "name": "sequencer_impl",
      "kind": "sequencer"
    }
  ],
  "cells": [
    {
      "name": "cell_top_impl",
      "kind": "cell_top",
      "controller": "sequencer_impl",
      "resource_list": [
        "swb_impl",
        "iosram_top_impl"
      ]
    },
    {
      "name": "cell_mid_impl",
      "kind": "cell_mid",
      "controller": "sequencer_impl",
      "resource_list": [
        "swb_impl",
        "rf_impl",
        "rf_impl",
        "rf_impl",
        "dpu_impl"
      ]
    },
    {
      "name": "cell_btm_impl",
      "kind": "cell_btm",
      "controller": "sequencer_impl",
      "resource_list": [
        "swb_impl",
        "iosram_btm_impl"
      ]
    }
  ]
}
```

```
"fabric": {
  "height": 3,
  "width": 1,
  "cells_list": [
    {
      "coordinates": [
        {
          "row": 0,
          "col": 0
        }
      ],
      "cell_name": "cell_top_impl"
    },
    {
      "coordinates": [
        {
          "row": 1,
          "col": 0
        }
      ],
      "cell_name": "cell_mid_impl"
    },
    {
      "coordinates": [
        {
          "row": 2,
          "col": 0
        }
      ],
      "cell_name": "cell_btm_impl"
    }
  ],
  "custom_properties": [
    {
      "name": "IO_DATA_WIDTH",
      "value": 256
    },
    {
      "name": "IO_ADDR_WIDTH",
      "value": 16
    },
    {
      "name": "RESOURCE_INSTR_WIDTH",
      "value": 27
    },
    {
      "name": "ROWS",
      "value": 3
    },
    {
      "name": "COLS",
      "value": 1
    },
    {
      "name": "INSTR_DATA_WIDTH",
      "value": 32
    },
    {
      "name": "INSTR_ADDR_WIDTH",
      "value": 6
    },
    {
      "name": "INSTR_HOPS_WIDTH",
      "value": 4
    },
    {
      "name": "INSTR_OPCODE_BITWIDTH",
      "value": 3
    },
    {
      "name": "BULK_WIDTH",
      "value": 256
    }
  ],
  "interface": {
    "input_buffer_depth": 1024,
    "output_buffer_depth": 1024
  }
}
```

Input Buffer

[0,0]

sequencer\_impl



Input Buffer

[1,0]

sequencer\_impl



[2,0]

sequencer\_impl



Output Buffer

# Compose a fabric

## fabric JSON description

```
{  
  "platform": "drra",  
  "resources": [  
    {  
      "name": "swb_impl",  
      "kind": "swb"  
    },  
    {  
      "name": "iosram_top_impl",  
      "kind": "iosram_top"  
    },  
    {  
      "name": "iosram_btm_impl",  
      "kind": "iosram_btm"  
    },  
    {  
      "name": "rf_impl",  
      "kind": "rf"  
    },  
    {  
      "name": "dpu_impl",  
      "kind": "dpu"  
    }  
  ],  
  "controllers": [  
    {  
      "name": "sequencer_impl",  
      "kind": "sequencer"  
    }  
  ],  
  "cells": [  
    {  
      "name": "cell_top_impl",  
      "kind": "cell_top",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_top_impl"  
      ]  
    },  
    {  
      "name": "cell_mid_impl",  
      "kind": "cell_mid",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "rf_impl",  
        "rf_impl",  
        "rf_impl",  
        "dpu_impl"  
      ]  
    },  
    {  
      "name": "cell_btm_impl",  
      "kind": "cell_btm",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_btm_impl"  
      ]  
    }  
  ]  
}
```

1. List the components you want to use

```
"fabric": {  
  "height": 3,  
  "width": 1,  
  "cells_list": [  
    {  
      "coordinates": [  
        {  
          "row": 0,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_top_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 1,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_mid_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 2,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_btm_impl"  
    }  
  ],  
  "custom_properties": [  
    {  
      "name": "IO_DATA_WIDTH",  
      "value": 256  
    },  
    {  
      "name": "IO_ADDR_WIDTH",  
      "value": 16  
    },  
    {  
      "name": "RESOURCE_INSTR_WIDTH",  
      "value": 27  
    },  
    {  
      "name": "ROWS",  
      "value": 3  
    },  
    {  
      "name": "COLS",  
      "value": 1  
    },  
    {  
      "name": "INSTR_DATA_WIDTH",  
      "value": 32  
    },  
    {  
      "name": "INSTR_ADDR_WIDTH",  
      "value": 6  
    },  
    {  
      "name": "INSTR_HOPS_WIDTH",  
      "value": 4  
    },  
    {  
      "name": "INSTR_OPCODE_BITWIDTH",  
      "value": 3  
    },  
    {  
      "name": "BULK_WIDTH",  
      "value": 256  
    }  
  ],  
  "interface": {  
    "input_buffer_depth": 1024,  
    "output_buffer_depth": 1024  
  }  
}
```

Input Buffer

[0,0]

sequencer\_impl

|    |                 |
|----|-----------------|
| 10 |                 |
| 17 |                 |
| 13 |                 |
| 11 |                 |
| 9  |                 |
| 7  |                 |
| 3  |                 |
| 16 | iosram_top_impl |
| 12 |                 |
| 10 |                 |
| 8  |                 |
| 6  |                 |
| 4  |                 |
| 2  | swb_impl        |

[1,0]

sequencer\_impl

|    |          |
|----|----------|
| 23 |          |
| 21 | dpu_impl |
| 19 |          |
| 17 | rf_impl  |
| 15 |          |
| 13 | rf_impl  |
| 11 |          |
| 9  | rf_impl  |
| 7  |          |
| 5  | swb_impl |
| 20 |          |
| 18 |          |
| 16 |          |
| 14 |          |
| 12 |          |
| 10 |          |
| 8  |          |
| 6  |          |
| 4  |          |
| 2  |          |

[2,0]

sequencer\_impl

|    |                 |
|----|-----------------|
| 19 |                 |
| 17 |                 |
| 15 |                 |
| 13 |                 |
| 11 |                 |
| 9  |                 |
| 7  |                 |
| 3  |                 |
| 18 | iosram_btm_impl |
| 16 |                 |
| 14 |                 |
| 12 |                 |
| 10 |                 |
| 8  |                 |
| 6  |                 |
| 4  |                 |
| 2  | swb_impl        |

Output Buffer

# Compose a fabric

## fabric JSON description

```
{  
  "platform": "drra",  
  "resources": [  
    {  
      "name": "swb_impl",  
      "kind": "swb"  
    },  
    {  
      "name": "iosram_top_impl",  
      "kind": "iosram_top"  
    },  
    {  
      "name": "iosram_btm_impl",  
      "kind": "iosram_btm"  
    },  
    {  
      "name": "rf_impl",  
      "kind": "rf"  
    },  
    {  
      "name": "dpu_impl",  
      "kind": "dpu"  
    }  
  ],  
  "controllers": [  
    {  
      "name": "sequencer_impl",  
      "kind": "sequencer"  
    }  
  ],  
  "cells": [  
    {  
      "name": "cell_top_impl",  
      "kind": "cell_top",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_top_impl"  
      ]  
    },  
    {  
      "name": "cell_mid_impl",  
      "kind": "cell_mid",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "rf_impl",  
        "rf_impl",  
        "rf_impl",  
        "dpu_impl"  
      ]  
    },  
    {  
      "name": "cell_btm_impl",  
      "kind": "cell_btm",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_btm_impl"  
      ]  
    }  
  ]  
}
```

1. List the components you want to use

2. Compose cells with those components

```
"fabric": {  
  "height": 3,  
  "width": 1,  
  "cells_list": [  
    {  
      "coordinates": [  
        {  
          "row": 0,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_top_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 1,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_mid_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 2,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_btm_impl"  
    }  
  ],  
  "custom_properties": [  
    {  
      "name": "IO_DATA_WIDTH",  
      "value": 256  
    },  
    {  
      "name": "IO_ADDR_WIDTH",  
      "value": 16  
    },  
    {  
      "name": "RESOURCE_INSTR_WIDTH",  
      "value": 27  
    },  
    {  
      "name": "ROWS",  
      "value": 3  
    },  
    {  
      "name": "COLS",  
      "value": 1  
    },  
    {  
      "name": "INSTR_DATA_WIDTH",  
      "value": 32  
    },  
    {  
      "name": "INSTR_ADDR_WIDTH",  
      "value": 6  
    },  
    {  
      "name": "INSTR_HOPS_WIDTH",  
      "value": 4  
    },  
    {  
      "name": "INSTR_OPCODE_BITWIDTH",  
      "value": 3  
    },  
    {  
      "name": "BULK_WIDTH",  
      "value": 256  
    }  
  ],  
  "interface": {  
    "input_buffer_depth": 1024,  
    "output_buffer_depth": 1024  
  }  
}
```

Input Buffer

[0,0]

sequencer\_impl

|     |                 |     |
|-----|-----------------|-----|
| 10  | iosram_top_impl | 16  |
| 11  |                 | 12  |
| 12  |                 | 13  |
| 13  |                 | 14  |
| 14  |                 | 15  |
| 15  |                 | 16  |
| 16  |                 | 17  |
| 17  |                 | 18  |
| 18  |                 | 19  |
| 19  |                 | 20  |
| 20  |                 | 21  |
| 21  |                 | 22  |
| 22  |                 | 23  |
| 23  |                 | 24  |
| 24  |                 | 25  |
| 25  |                 | 26  |
| 26  |                 | 27  |
| 27  |                 | 28  |
| 28  |                 | 29  |
| 29  |                 | 30  |
| 30  |                 | 31  |
| 31  |                 | 32  |
| 32  |                 | 33  |
| 33  |                 | 34  |
| 34  |                 | 35  |
| 35  |                 | 36  |
| 36  |                 | 37  |
| 37  |                 | 38  |
| 38  |                 | 39  |
| 39  |                 | 40  |
| 40  |                 | 41  |
| 41  |                 | 42  |
| 42  |                 | 43  |
| 43  |                 | 44  |
| 44  |                 | 45  |
| 45  |                 | 46  |
| 46  |                 | 47  |
| 47  |                 | 48  |
| 48  |                 | 49  |
| 49  |                 | 50  |
| 50  |                 | 51  |
| 51  |                 | 52  |
| 52  |                 | 53  |
| 53  |                 | 54  |
| 54  |                 | 55  |
| 55  |                 | 56  |
| 56  |                 | 57  |
| 57  |                 | 58  |
| 58  |                 | 59  |
| 59  |                 | 60  |
| 60  |                 | 61  |
| 61  |                 | 62  |
| 62  |                 | 63  |
| 63  |                 | 64  |
| 64  |                 | 65  |
| 65  |                 | 66  |
| 66  |                 | 67  |
| 67  |                 | 68  |
| 68  |                 | 69  |
| 69  |                 | 70  |
| 70  |                 | 71  |
| 71  |                 | 72  |
| 72  |                 | 73  |
| 73  |                 | 74  |
| 74  |                 | 75  |
| 75  |                 | 76  |
| 76  |                 | 77  |
| 77  |                 | 78  |
| 78  |                 | 79  |
| 79  |                 | 80  |
| 80  |                 | 81  |
| 81  |                 | 82  |
| 82  |                 | 83  |
| 83  |                 | 84  |
| 84  |                 | 85  |
| 85  |                 | 86  |
| 86  |                 | 87  |
| 87  |                 | 88  |
| 88  |                 | 89  |
| 89  |                 | 90  |
| 90  |                 | 91  |
| 91  |                 | 92  |
| 92  |                 | 93  |
| 93  |                 | 94  |
| 94  |                 | 95  |
| 95  |                 | 96  |
| 96  |                 | 97  |
| 97  |                 | 98  |
| 98  |                 | 99  |
| 99  |                 | 100 |
| 100 |                 | 101 |
| 101 |                 | 102 |
| 102 |                 | 103 |
| 103 |                 | 104 |
| 104 |                 | 105 |
| 105 |                 | 106 |
| 106 |                 | 107 |
| 107 |                 | 108 |
| 108 |                 | 109 |
| 109 |                 | 110 |
| 110 |                 | 111 |
| 111 |                 | 112 |
| 112 |                 | 113 |
| 113 |                 | 114 |
| 114 |                 | 115 |
| 115 |                 | 116 |
| 116 |                 | 117 |
| 117 |                 | 118 |
| 118 |                 | 119 |
| 119 |                 | 120 |
| 120 |                 | 121 |
| 121 |                 | 122 |
| 122 |                 | 123 |
| 123 |                 | 124 |
| 124 |                 | 125 |
| 125 |                 | 126 |
| 126 |                 | 127 |
| 127 |                 | 128 |
| 128 |                 | 129 |
| 129 |                 | 130 |
| 130 |                 | 131 |
| 131 |                 | 132 |
| 132 |                 | 133 |
| 133 |                 | 134 |
| 134 |                 | 135 |
| 135 |                 | 136 |
| 136 |                 | 137 |
| 137 |                 | 138 |
| 138 |                 | 139 |
| 139 |                 | 140 |
| 140 |                 | 141 |
| 141 |                 | 142 |
| 142 |                 | 143 |
| 143 |                 | 144 |
| 144 |                 | 145 |
| 145 |                 | 146 |
| 146 |                 | 147 |
| 147 |                 | 148 |
| 148 |                 | 149 |
| 149 |                 | 150 |
| 150 |                 | 151 |
| 151 |                 | 152 |
| 152 |                 | 153 |
| 153 |                 | 154 |
| 154 |                 | 155 |
| 155 |                 | 156 |
| 156 |                 | 157 |
| 157 |                 | 158 |
| 158 |                 | 159 |
| 159 |                 | 160 |
| 160 |                 | 161 |
| 161 |                 | 162 |
| 162 |                 | 163 |
| 163 |                 | 164 |
| 164 |                 | 165 |
| 165 |                 | 166 |
| 166 |                 | 167 |
| 167 |                 | 168 |
| 168 |                 | 169 |
| 169 |                 | 170 |
| 170 |                 | 171 |
| 171 |                 | 172 |
| 172 |                 | 173 |
| 173 |                 | 174 |
| 174 |                 | 175 |
| 175 |                 | 176 |
| 176 |                 | 177 |
| 177 |                 | 178 |
| 178 |                 | 179 |
| 179 |                 | 180 |
| 180 |                 | 181 |
| 181 |                 | 182 |
| 182 |                 | 183 |
| 183 |                 | 184 |
| 184 |                 | 185 |
| 185 |                 | 186 |
| 186 |                 | 187 |
| 187 |                 | 188 |
| 188 |                 | 189 |
| 189 |                 | 190 |
| 190 |                 | 191 |
| 191 |                 | 192 |
| 192 |                 | 193 |
| 193 |                 | 194 |
| 194 |                 | 195 |
| 195 |                 | 196 |
| 196 |                 | 197 |
| 197 |                 | 198 |
| 198 |                 | 199 |
| 199 |                 | 200 |
| 200 |                 | 201 |
| 201 |                 | 202 |
| 202 |                 | 203 |
| 203 |                 | 204 |
| 204 |                 | 205 |
| 205 |                 | 206 |
| 206 |                 | 207 |
| 207 |                 | 208 |
| 208 |                 | 209 |
| 209 |                 | 210 |
| 210 |                 | 211 |
| 211 |                 | 212 |
| 212 |                 | 213 |
| 213 |                 | 214 |
| 214 |                 | 215 |
| 215 |                 | 216 |
| 216 |                 | 217 |
| 217 |                 | 218 |
| 218 |                 | 219 |
| 219 |                 | 220 |
| 220 |                 | 221 |
| 221 |                 | 222 |
| 222 |                 | 223 |
| 223 |                 | 224 |
| 224 |                 | 225 |
| 225 |                 | 226 |
| 226 |                 | 227 |
| 227 |                 | 228 |
| 228 |                 | 229 |
| 229 |                 | 230 |
| 230 |                 | 231 |
| 231 |                 | 232 |
| 232 |                 | 233 |
| 233 |                 | 234 |
| 234 |                 | 235 |
| 235 |                 | 236 |
| 236 |                 | 237 |
| 237 |                 | 238 |
| 238 |                 | 239 |
| 239 |                 | 240 |
| 240 |                 | 241 |
| 241 |                 | 242 |
| 242 |                 | 243 |
| 243 |                 | 244 |
| 244 |                 | 245 |
| 245 |                 | 246 |
| 246 |                 | 247 |
| 247 |                 | 248 |
| 248 |                 | 249 |
| 249 |                 | 250 |
| 250 |                 | 251 |
| 251 |                 | 252 |
| 252 |                 | 253 |
| 253 |                 | 254 |
| 254 |                 | 255 |
| 255 |                 | 256 |
| 256 |                 | 257 |
| 257 |                 | 258 |
| 258 |                 | 259 |
| 259 |                 | 260 |
| 260 |                 | 261 |
| 261 |                 | 262 |
| 262 |                 | 263 |
| 263 |                 | 264 |
| 264 |                 | 265 |
| 265 |                 | 266 |
| 266 |                 | 267 |
| 267 |                 | 268 |
| 268 |                 | 269 |
| 269 |                 | 270 |
| 270 |                 | 271 |
| 271 |                 | 272 |
| 272 |                 | 273 |
| 273 |                 | 274 |
| 274 |                 | 275 |
| 275 |                 | 276 |
| 276 |                 | 277 |
| 277 |                 | 278 |
| 278 |                 | 279 |
| 279 |                 | 280 |
| 280 |                 | 281 |
| 281 |                 | 282 |
| 282 |                 | 283 |
| 283 |                 | 284 |
| 284 |                 | 285 |
| 285 |                 | 286 |
| 286 |                 | 287 |
| 287 |                 | 288 |
| 288 |                 | 289 |
| 289 |                 | 290 |
| 290 |                 | 291 |
| 291 |                 | 292 |
| 292 |                 | 293 |
| 293 |                 | 294 |
| 294 |                 | 295 |
| 295 |                 | 296 |
| 296 |                 | 297 |
| 297 |                 | 298 |
| 298 |                 | 299 |
| 299 |                 | 300 |
| 300 |                 | 301 |
| 301 |                 | 302 |
| 302 |                 | 303 |
| 303 |                 | 304 |
| 304 |                 | 305 |
| 305 |                 | 306 |
| 306 |                 |     |

# Compose a fabric

## fabric JSON description

```
{  
  "platform": "drra",  
  "resources": [  
    {  
      "name": "swb_impl",  
      "kind": "swb"  
    },  
    {  
      "name": "iosram_top_impl",  
      "kind": "iosram_top"  
    },  
    {  
      "name": "iosram_btm_impl",  
      "kind": "iosram_btm"  
    },  
    {  
      "name": "rf_impl",  
      "kind": "rf"  
    },  
    {  
      "name": "dpu_impl",  
      "kind": "dpu"  
    }  
  ],  
  "controllers": [  
    {  
      "name": "sequencer_impl",  
      "kind": "sequencer"  
    }  
  ],  
  "cells": [  
    {  
      "name": "cell_top_impl",  
      "kind": "cell_top",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_top_impl"  
      ]  
    },  
    {  
      "name": "cell_mid_impl",  
      "kind": "cell_mid",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "rf_impl",  
        "rf_impl",  
        "rf_impl",  
        "dpu_impl"  
      ]  
    },  
    {  
      "name": "cell_btm_impl",  
      "kind": "cell_btm",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_btm_impl"  
      ]  
    }  
  ]  
}
```

1. List the components you want to use

2. Compose cells with those components

3. Compose the fabric with those cells

```
"fabric": {  
  "height": 3,  
  "width": 1,  
  "cells_list": [  
    {  
      "coordinates": [  
        {  
          "row": 0,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_top_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 1,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_mid_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 2,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_btm_impl"  
    }  
  ],  
  "custom_properties": [  
    {  
      "name": "IO_DATA_WIDTH",  
      "value": 256  
    },  
    {  
      "name": "IO_ADDR_WIDTH",  
      "value": 16  
    },  
    {  
      "name": "RESOURCE_INSTR_WIDTH",  
      "value": 27  
    },  
    {  
      "name": "ROWS",  
      "value": 3  
    },  
    {  
      "name": "COLS",  
      "value": 1  
    },  
    {  
      "name": "INSTR_DATA_WIDTH",  
      "value": 32  
    },  
    {  
      "name": "INSTR_ADDR_WIDTH",  
      "value": 6  
    },  
    {  
      "name": "INSTR_HOPS_WIDTH",  
      "value": 4  
    },  
    {  
      "name": "INSTR_OPCODE_BITWIDTH",  
      "value": 3  
    },  
    {  
      "name": "BULK_WIDTH",  
      "value": 256  
    }  
  ],  
  "interface": {  
    "input_buffer_depth": 1024,  
    "output_buffer_depth": 1024  
  }  
}
```

Input Buffer

[0,0]

sequencer\_impl

|    |    |    |    |    |   |   |   |   |   |
|----|----|----|----|----|---|---|---|---|---|
| 10 | 17 | 14 | 12 | 10 | 8 | 6 | 4 | 2 | 0 |
|    |    |    |    |    |   |   |   |   |   |
|    |    |    |    |    |   |   |   |   |   |
|    |    |    |    |    |   |   |   |   |   |
|    |    |    |    |    |   |   |   |   |   |

iosram\_top\_impl

swb\_impl

[1,0]

sequencer\_impl

|    |    |    |    |    |    |    |   |   |   |   |   |   |
|----|----|----|----|----|----|----|---|---|---|---|---|---|
| 23 | 21 | 19 | 17 | 15 | 13 | 11 | 9 | 7 | 5 | 3 | 1 | 0 |
|    |    |    |    |    |    |    |   |   |   |   |   |   |
|    |    |    |    |    |    |    |   |   |   |   |   |   |
|    |    |    |    |    |    |    |   |   |   |   |   |   |
|    |    |    |    |    |    |    |   |   |   |   |   |   |

dpu\_impl

rf\_impl

rf\_impl

rf\_impl

swb\_impl

[2,0]

sequencer\_impl

|    |    |    |    |    |   |   |   |   |   |   |
|----|----|----|----|----|---|---|---|---|---|---|
| 19 | 17 | 15 | 13 | 11 | 9 | 7 | 5 | 3 | 1 | 0 |
|    |    |    |    |    |   |   |   |   |   |   |
|    |    |    |    |    |   |   |   |   |   |   |
|    |    |    |    |    |   |   |   |   |   |   |
|    |    |    |    |    |   |   |   |   |   |   |

iosram\_btm\_impl

swb\_impl

Output Buffer

# Compose a fabric

## fabric JSON description

```
{  
  "platform": "drra",  
  "resources": [  
    {  
      "name": "swb_impl",  
      "kind": "swb"  
    },  
    {  
      "name": "iosram_top_impl",  
      "kind": "iosram_top"  
    },  
    {  
      "name": "iosram_btm_impl",  
      "kind": "iosram_btm"  
    },  
    {  
      "name": "rf_impl",  
      "kind": "rf"  
    },  
    {  
      "name": "dpu_impl",  
      "kind": "dpu"  
    }  
  ],  
  "controllers": [  
    {  
      "name": "sequencer_impl",  
      "kind": "sequencer"  
    }  
  ],  
  "cells": [  
    {  
      "name": "cell_top_impl",  
      "kind": "cell_top",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_top_impl"  
      ]  
    },  
    {  
      "name": "cell_mid_impl",  
      "kind": "cell_mid",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "rf_impl",  
        "rf_impl",  
        "rf_impl",  
        "dpu_impl"  
      ]  
    },  
    {  
      "name": "cell_btm_impl",  
      "kind": "cell_btm",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_btm_impl"  
      ]  
    }  
  ]  
}
```

1. List the components you want to use

2. Compose cells with those components

3. Compose the fabric with those cells

(4.) Add custom properties if needed

```
"fabric": {  
  "height": 3,  
  "width": 1,  
  "cells_list": [  
    {  
      "coordinates": [  
        {  
          "row": 0,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_top_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 1,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_mid_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 2,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_btm_impl"  
    }  
  ],  
  "custom_properties": [  
    {  
      "name": "IO_DATA_WIDTH",  
      "value": 256  
    },  
    {  
      "name": "IO_ADDR_WIDTH",  
      "value": 16  
    },  
    {  
      "name": "RESOURCE_INSTR_WIDTH",  
      "value": 27  
    },  
    {  
      "name": "ROWS",  
      "value": 3  
    },  
    {  
      "name": "COLS",  
      "value": 1  
    },  
    {  
      "name": "INSTR_DATA_WIDTH",  
      "value": 32  
    },  
    {  
      "name": "INSTR_ADDR_WIDTH",  
      "value": 6  
    },  
    {  
      "name": "INSTR_HOPS_WIDTH",  
      "value": 4  
    },  
    {  
      "name": "INSTR_OPCODE_BITWIDTH",  
      "value": 3  
    },  
    {  
      "name": "BULK_WIDTH",  
      "value": 256  
    }  
  ],  
  "interface": {  
    "input_buffer_depth": 1024,  
    "output_buffer_depth": 1024  
  }  
}
```

Input Buffer

[0,0]

sequencer\_impl

|    |                 |    |
|----|-----------------|----|
| 10 | iosram_top_impl | 16 |
| 11 |                 | 12 |
| 12 |                 | 13 |
| 13 |                 | 14 |
| 14 |                 | 15 |
| 15 |                 | 16 |
| 16 |                 | 17 |
| 17 |                 | 18 |
| 18 |                 | 19 |
| 19 |                 | 20 |
| 20 |                 | 21 |
| 21 |                 | 22 |
| 22 |                 | 23 |

[1,0]

sequencer\_impl

|    |          |    |
|----|----------|----|
| 20 | dpu_impl | 22 |
| 21 | rf_impl  | 23 |
| 22 | rf_impl  | 24 |
| 23 | rf_impl  | 25 |
| 24 | rf_impl  | 26 |
| 25 | swb_impl | 27 |

[2,0]

sequencer\_impl

|     |                 |     |
|-----|-----------------|-----|
| 18  | iosram_btm_impl | 20  |
| 19  |                 | 21  |
| 20  |                 | 22  |
| 21  |                 | 23  |
| 22  |                 | 24  |
| 23  |                 | 25  |
| 24  |                 | 26  |
| 25  |                 | 27  |
| 26  |                 | 28  |
| 27  |                 | 29  |
| 28  |                 | 30  |
| 29  |                 | 31  |
| 30  |                 | 32  |
| 31  |                 | 33  |
| 32  |                 | 34  |
| 33  |                 | 35  |
| 34  |                 | 36  |
| 35  |                 | 37  |
| 36  |                 | 38  |
| 37  |                 | 39  |
| 38  |                 | 40  |
| 39  |                 | 41  |
| 40  |                 | 42  |
| 41  |                 | 43  |
| 42  |                 | 44  |
| 43  |                 | 45  |
| 44  |                 | 46  |
| 45  |                 | 47  |
| 46  |                 | 48  |
| 47  |                 | 49  |
| 48  |                 | 50  |
| 49  |                 | 51  |
| 50  |                 | 52  |
| 51  |                 | 53  |
| 52  |                 | 54  |
| 53  |                 | 55  |
| 54  |                 | 56  |
| 55  |                 | 57  |
| 56  |                 | 58  |
| 57  |                 | 59  |
| 58  |                 | 60  |
| 59  |                 | 61  |
| 60  |                 | 62  |
| 61  |                 | 63  |
| 62  |                 | 64  |
| 63  |                 | 65  |
| 64  |                 | 66  |
| 65  |                 | 67  |
| 66  |                 | 68  |
| 67  |                 | 69  |
| 68  |                 | 70  |
| 69  |                 | 71  |
| 70  |                 | 72  |
| 71  |                 | 73  |
| 72  |                 | 74  |
| 73  |                 | 75  |
| 74  |                 | 76  |
| 75  |                 | 77  |
| 76  |                 | 78  |
| 77  |                 | 79  |
| 78  |                 | 80  |
| 79  |                 | 81  |
| 80  |                 | 82  |
| 81  |                 | 83  |
| 82  |                 | 84  |
| 83  |                 | 85  |
| 84  |                 | 86  |
| 85  |                 | 87  |
| 86  |                 | 88  |
| 87  |                 | 89  |
| 88  |                 | 90  |
| 89  |                 | 91  |
| 90  |                 | 92  |
| 91  |                 | 93  |
| 92  |                 | 94  |
| 93  |                 | 95  |
| 94  |                 | 96  |
| 95  |                 | 97  |
| 96  |                 | 98  |
| 97  |                 | 99  |
| 98  |                 | 100 |
| 99  |                 | 101 |
| 100 |                 | 102 |
| 101 |                 | 103 |
| 102 |                 | 104 |
| 103 |                 | 105 |
| 104 |                 | 106 |
| 105 |                 | 107 |
| 106 |                 | 108 |
| 107 |                 | 109 |
| 108 |                 | 110 |
| 109 |                 | 111 |
| 110 |                 | 112 |
| 111 |                 | 113 |
| 112 |                 | 114 |
| 113 |                 | 115 |
| 114 |                 | 116 |
| 115 |                 | 117 |
| 116 |                 | 118 |
| 117 |                 | 119 |
| 118 |                 | 120 |
| 119 |                 | 121 |
| 120 |                 | 122 |
| 121 |                 | 123 |
| 122 |                 | 124 |
| 123 |                 | 125 |
| 124 |                 | 126 |
| 125 |                 | 127 |
| 126 |                 | 128 |
| 127 |                 | 129 |
| 128 |                 | 130 |
| 129 |                 | 131 |
| 130 |                 | 132 |
| 131 |                 | 133 |
| 132 |                 | 134 |
| 133 |                 | 135 |
| 134 |                 | 136 |
| 135 |                 | 137 |
| 136 |                 | 138 |
| 137 |                 | 139 |
| 138 |                 | 140 |
| 139 |                 | 141 |
| 140 |                 | 142 |
| 141 |                 | 143 |
| 142 |                 | 144 |
| 143 |                 | 145 |
| 144 |                 | 146 |
| 145 |                 | 147 |
| 146 |                 | 148 |
| 147 |                 | 149 |
| 148 |                 | 150 |
| 149 |                 | 151 |
| 150 |                 | 152 |
| 151 |                 | 153 |
| 152 |                 | 154 |
| 153 |                 | 155 |
| 154 |                 | 156 |
| 155 |                 | 157 |
| 156 |                 | 158 |
| 157 |                 | 159 |
| 158 |                 | 160 |
| 159 |                 | 161 |
| 160 |                 | 162 |
| 161 |                 | 163 |
| 162 |                 | 164 |
| 163 |                 | 165 |
| 164 |                 | 166 |
| 165 |                 | 167 |
| 166 |                 | 168 |
| 167 |                 | 169 |
| 168 |                 | 170 |
| 169 |                 | 171 |
| 170 |                 | 172 |
| 171 |                 | 173 |
| 172 |                 | 174 |
| 173 |                 | 175 |
| 174 |                 | 176 |
| 175 |                 | 177 |
| 176 |                 | 178 |
| 177 |                 | 179 |
| 178 |                 | 180 |
| 179 |                 | 181 |
| 180 |                 | 182 |
| 181 |                 | 183 |
| 182 |                 | 184 |
| 183 |                 | 185 |
| 184 |                 | 186 |
| 185 |                 | 187 |
| 186 |                 | 188 |
| 187 |                 | 189 |
| 188 |                 | 190 |
| 189 |                 | 191 |
| 190 |                 | 192 |
| 191 |                 | 193 |
| 192 |                 | 194 |
| 193 |                 | 195 |
| 194 |                 | 196 |
| 195 |                 | 197 |
| 196 |                 | 198 |
| 197 |                 | 199 |
| 198 |                 | 200 |
| 199 |                 | 201 |
| 200 |                 | 202 |
| 201 |                 | 203 |
| 202 |                 | 204 |
| 203 |                 | 205 |
| 204 |                 | 206 |
| 205 |                 | 207 |
| 206 |                 | 208 |
| 207 |                 | 209 |
| 208 |                 | 210 |
| 209 |                 | 211 |
| 210 |                 | 212 |
| 211 |                 | 213 |
| 212 |                 | 214 |
| 213 |                 | 215 |
| 214 |                 | 216 |
| 215 |                 | 217 |
| 216 |                 | 218 |
| 217 |                 | 219 |
| 218 |                 | 220 |
| 219 |                 | 221 |
| 220 |                 | 222 |
| 221 |                 | 223 |
| 222 |                 | 224 |
| 223 |                 | 225 |
| 224 |                 | 226 |
| 225 |                 | 227 |
| 226 |                 | 228 |
| 227 |                 | 229 |
| 228 |                 | 230 |
| 229 |                 | 231 |
| 230 |                 | 232 |
| 231 |                 | 233 |
| 232 |                 | 234 |
| 233 |                 | 235 |
| 234 |                 | 236 |
| 235 |                 | 237 |
| 236 |                 | 238 |
| 237 |                 | 239 |
| 238 |                 | 240 |
| 239 |                 | 241 |
| 240 |                 | 242 |
| 241 |                 | 243 |
| 242 |                 | 244 |
| 243 |                 | 245 |
| 244 |                 | 246 |
| 245 |                 | 247 |
| 246 |                 | 248 |
| 247 |                 | 249 |
| 248 |                 | 250 |
| 249 |                 | 251 |
| 250 |                 | 252 |
| 251 |                 | 253 |
| 252 |                 | 254 |
| 253 |                 | 255 |
| 254 |                 | 256 |
| 255 |                 | 257 |
| 256 |                 | 258 |
| 257 |                 | 259 |
| 258 |                 | 260 |
| 259 |                 | 261 |
| 260 |                 | 262 |
| 261 |                 | 263 |
| 262 |                 | 264 |
| 263 |                 | 265 |
| 264 |                 | 266 |
| 265 |                 | 267 |

# Compose a fabric

## fabric JSON description

```
{  
  "platform": "drra",  
  "resources": [  
    {  
      "name": "swb_impl",  
      "kind": "swb"  
    },  
    {  
      "name": "iosram_top_impl",  
      "kind": "iosram_top"  
    },  
    {  
      "name": "iosram_btm_impl",  
      "kind": "iosram_btm"  
    },  
    {  
      "name": "rf_impl",  
      "kind": "rf"  
    },  
    {  
      "name": "dpu_impl",  
      "kind": "dpu"  
    }  
  ],  
  "controllers": [  
    {  
      "name": "sequencer_impl",  
      "kind": "sequencer"  
    }  
  ],  
  "cells": [  
    {  
      "name": "cell_top_impl",  
      "kind": "cell_top",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_top_impl"  
      ]  
    },  
    {  
      "name": "cell_mid_impl",  
      "kind": "cell_mid",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "rf_impl",  
        "rf_impl",  
        "rf_impl",  
        "dpu_impl"  
      ]  
    },  
    {  
      "name": "cell_btm_impl",  
      "kind": "cell_btm",  
      "controller": "sequencer_impl",  
      "resource_list": [  
        "swb_impl",  
        "iosram_btm_impl"  
      ]  
    }  
  ]  
}
```

1. List the components you want to use

2. Compose cells with those components

3. Compose the fabric with those cells

(4.) Add custom properties if needed

5. Specify the interface depth

```
"fabric": {  
  "height": 3,  
  "width": 1,  
  "cells_list": [  
    {  
      "coordinates": [  
        {  
          "row": 0,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_top_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 1,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_mid_impl"  
    },  
    {  
      "coordinates": [  
        {  
          "row": 2,  
          "col": 0  
        }  
      ],  
      "cell_name": "cell_btm_impl"  
    }  
  ],  
  "custom_properties": [  
    {  
      "name": "IO_DATA_WIDTH",  
      "value": 256  
    },  
    {  
      "name": "IO_ADDR_WIDTH",  
      "value": 16  
    },  
    {  
      "name": "RESOURCE_INSTR_WIDTH",  
      "value": 27  
    },  
    {  
      "name": "ROWS",  
      "value": 3  
    },  
    {  
      "name": "COLS",  
      "value": 1  
    },  
    {  
      "name": "INSTR_DATA_WIDTH",  
      "value": 32  
    },  
    {  
      "name": "INSTR_ADDR_WIDTH",  
      "value": 6  
    },  
    {  
      "name": "INSTR_HOPS_WIDTH",  
      "value": 4  
    },  
    {  
      "name": "INSTR_OPCODE_BITWIDTH",  
      "value": 3  
    },  
    {  
      "name": "BULK_WIDTH",  
      "value": 256  
    }  
  ],  
  "interface": {  
    "input_buffer_depth": 1024,  
    "output_buffer_depth": 1024  
  }  
}
```

Input Buffer

[0,0]

sequencer\_impl

|    |                 |    |
|----|-----------------|----|
| 10 | iosram_top_impl | 16 |
| 11 |                 | 12 |
| 12 |                 | 13 |
| 13 |                 | 14 |
| 14 |                 | 15 |
| 15 |                 | 16 |
| 16 |                 | 17 |
| 17 |                 | 18 |
| 18 |                 | 19 |
| 19 |                 | 20 |
| 20 |                 | 21 |
| 21 |                 | 22 |
| 22 |                 | 23 |

[1,0]

sequencer\_impl

|    |          |    |
|----|----------|----|
| 20 | dpu_impl | 22 |
| 21 | rf_impl  | 21 |
| 22 | rf_impl  | 20 |
| 23 | rf_impl  | 19 |
| 19 | rf_impl  | 18 |
| 18 | rf_impl  | 17 |
| 17 | rf_impl  | 16 |
| 16 | rf_impl  | 15 |
| 15 | rf_impl  | 14 |
| 14 | rf_impl  | 13 |
| 13 | rf_impl  | 12 |
| 12 | rf_impl  | 11 |
| 11 | rf_impl  | 10 |
| 10 | rf_impl  | 9  |
| 9  | rf_impl  | 8  |
| 8  | rf_impl  | 7  |
| 7  | rf_impl  | 6  |
| 6  | rf_impl  | 5  |
| 5  | rf_impl  | 4  |
| 4  | rf_impl  | 3  |
| 3  | rf_impl  | 2  |
| 2  | rf_impl  | 1  |
| 1  | rf_impl  | 0  |

[2,0]

sequencer\_impl

|    |                 |    |
|----|-----------------|----|
| 18 | iosram_btm_impl | 20 |
| 19 |                 | 19 |
| 20 |                 | 18 |
| 21 |                 | 17 |
| 22 |                 | 16 |
| 16 |                 | 15 |
| 15 |                 | 14 |
| 14 |                 | 13 |
| 13 |                 | 12 |
| 12 |                 | 11 |
| 11 |                 | 10 |
| 10 |                 | 9  |
| 9  |                 | 8  |
| 8  |                 | 7  |
| 7  |                 | 6  |
| 6  |                 | 5  |
| 5  |                 | 4  |
| 4  |                 | 3  |
| 3  |                 | 2  |
| 2  |                 | 1  |
| 1  |                 | 0  |

Output Buffer

**if you need a new  
component...**

# Creating a new DRRA component

# **Creating a new DRRA component**

## **Main steps**

# Creating a new DRRA component

## Main steps

1. Write the architecture description file (arch.json)

# Creating a new DRRA component

## Main steps

1. Write the architecture description file (arch.json)
2. Write the instruction set description file (isa.json)

# Creating a new DRRA component

## Main steps

1. Write the architecture description file (arch.json)
2. Write the instruction set description file (isa.json)
3. Generate the component structure using Vesyla (Linux cmd)

# Creating a new DRRA component

## Main steps

1. Write the architecture description file (arch.json)
2. Write the instruction set description file (isa.json)
3. Generate the component structure using Vesyla (Linux cmd)
4. Implement the simulation model (SST model)

# Creating a new DRRA component

## Main steps

1. Write the architecture description file (arch.json)
2. Write the instruction set description file (isa.json)
3. Generate the component structure using Vesyla (Linux cmd)
4. Implement the simulation model (SST model)
5. Implement the timing expression model (Rust code)

# Creating a new DRRA component

## Main steps

1. Write the architecture description file (arch.json)
2. Write the instruction set description file (isa.json)
3. Generate the component structure using Vesyla (Linux cmd)
4. Implement the simulation model (SST model)
5. Implement the timing expression model (Rust code)
6. Implement the RTL design (SystemVerilog)

# 1. Architecture description

## arch.json

```
{  
  "kind": "dpu",  
  "type": "resource",  
  "size": 2,  
  "io_input": false,  
  "io_output": false,  
  "custom_properties": [],  
  "required_parameters": [  
    "IO_DATA_WIDTH",  
    "IO_ADDR_WIDTH",  
    "RESOURCE_INSTR_WIDTH",  
    "BULK_BITWIDTH",  
    "WORD_BITWIDTH",  
    "NUM_SLOTS",  
    "FSM_PER_SLOT",  
    "INSTR_OPCODE_BITWIDTH"  
  ]  
}
```

# 1. Architecture description

## arch.json

List of fields of the architecture description:

```
{  
  "kind": "dpu",  
  "type": "resource",  
  "size": 2,  
  "io_input": false,  
  "io_output": false,  
  "custom_properties": [],  
  "required_parameters": [  
    "IO_DATA_WIDTH",  
    "IO_ADDR_WIDTH",  
    "RESOURCE_INSTR_WIDTH",  
    "BULK_BITWIDTH",  
    "WORD_BITWIDTH",  
    "NUM_SLOTS",  
    "FSM_PER_SLOT",  
    "INSTR_OPCODE_BITWIDTH"  
  ]  
}
```

# 1. Architecture description

## arch.json

```
{  
  "kind": "dpu",  
  "type": "resource",  
  "size": 2,  
  "io_input": false,  
  "io_output": false,  
  "custom_properties": [],  
  "required_parameters": [  
    "IO_DATA_WIDTH",  
    "IO_ADDR_WIDTH",  
    "RESOURCE_INSTR_WIDTH",  
    "BULK_BITWIDTH",  
    "WORD_BITWIDTH",  
    "NUM_SLOTS",  
    "FSM_PER_SLOT",  
    "INSTR_OPCODE_BITWIDTH"  
  ]  
}
```

List of fields of the architecture description:

- **kind:** name

# 1. Architecture description

## arch.json

```
{  
  "kind": "dpu",  
  "type": "resource",  
  "size": 2,  
  "io_input": false,  
  "io_output": false,  
  "custom_properties": [],  
  "required_parameters": [  
    "IO_DATA_WIDTH",  
    "IO_ADDR_WIDTH",  
    "RESOURCE_INSTR_WIDTH",  
    "BULK_BITWIDTH",  
    "WORD_BITWIDTH",  
    "NUM_SLOTS",  
    "FSM_PER_SLOT",  
    "INSTR_OPCODE_BITWIDTH"  
  ]  
}
```

List of fields of the architecture description:

- **kind**: name
- **type**: resource or controller

# 1. Architecture description

## arch.json

```
{  
    "kind": "dpu",  
    "type": "resource",  
    "size": 2,  
    "io_input": false,  
    "io_output": false,  
    "custom_properties": [],  
    "required_parameters": [  
        "IO_DATA_WIDTH",  
        "IO_ADDR_WIDTH",  
        "RESOURCE_INSTR_WIDTH",  
        "BULK_BITWIDTH",  
        "WORD_BITWIDTH",  
        "NUM_SLOTS",  
        "FSM_PER_SLOT",  
        "INSTR_OPCODE_BITWIDTH"  
    ]  
}
```

List of fields of the architecture description:

- **kind**: name
- **type**: resource or controller
- **size**: number of slots (default: 1)

# 1. Architecture description

## arch.json

```
{  
    "kind": "dpu",  
    "type": "resource",  
    "size": 2,  
    "io_input": false,  
    "io_output": false,  
    "custom_properties": [],  
    "required_parameters": [  
        "IO_DATA_WIDTH",  
        "IO_ADDR_WIDTH",  
        "RESOURCE_INSTR_WIDTH",  
        "BULK_BITWIDTH",  
        "WORD_BITWIDTH",  
        "NUM_SLOTS",  
        "FSM_PER_SLOT",  
        "INSTR_OPCODE_BITWIDTH"  
    ]  
}
```

List of fields of the architecture description:

- **kind**: name
- **type**: resource or controller
- **size**: number of slots (default: 1)
- **io\_input**: whether the component connects to input buffer (default: false)

# 1. Architecture description

## arch.json

```
{  
    "kind": "dpu",  
    "type": "resource",  
    "size": 2,  
    "io_input": false,  
    "io_output": false,  
    "custom_properties": [],  
    "required_parameters": [  
        "IO_DATA_WIDTH",  
        "IO_ADDR_WIDTH",  
        "RESOURCE_INSTR_WIDTH",  
        "BULK_BITWIDTH",  
        "WORD_BITWIDTH",  
        "NUM_SLOTS",  
        "FSM_PER_SLOT",  
        "INSTR_OPCODE_BITWIDTH"  
    ]  
}
```

List of fields of the architecture description:

- **kind**: name
- **type**: resource or controller
- **size**: number of slots (default: 1)
- **io\_input**: whether the component connects to input buffer (default: false)
- **io\_output**: whether the component connects to output buffer (default: false)

# 1. Architecture description

## arch.json

```
{  
    "kind": "dpu",  
    "type": "resource",  
    "size": 2,  
    "io_input": false,  
    "io_output": false,  
    "custom_properties": [],  
    "required_parameters": [  
        "IO_DATA_WIDTH",  
        "IO_ADDR_WIDTH",  
        "RESOURCE_INSTR_WIDTH",  
        "BULK_BITWIDTH",  
        "WORD_BITWIDTH",  
        "NUM_SLOTS",  
        "FSM_PER_SLOT",  
        "INSTR_OPCODE_BITWIDTH"  
    ]  
}
```

List of fields of the architecture description:

- **kind**: name
- **type**: resource or controller
- **size**: number of slots (default: 1)
- **io\_input**: whether the component connects to input buffer (default: false)
- **io\_output**: whether the component connects to output buffer (default: false)
- **custom\_properties**: (default: [ ] )

# 1. Architecture description

## arch.json

```
{  
    "kind": "dpu",  
    "type": "resource",  
    "size": 2,  
    "io_input": false,  
    "io_output": false,  
    "custom_properties": [],  
    "required_parameters": [  
        "IO_DATA_WIDTH",  
        "IO_ADDR_WIDTH",  
        "RESOURCE_INSTR_WIDTH",  
        "BULK_BITWIDTH",  
        "WORD_BITWIDTH",  
        "NUM_SLOTS",  
        "FSM_PER_SLOT",  
        "INSTR_OPCODE_BITWIDTH"  
    ]  
}
```

List of fields of the architecture description:

- **kind**: name
- **type**: resource or controller
- **size**: number of slots (default: 1)
- **io\_input**: whether the component connects to input buffer (default: false)
- **io\_output**: whether the component connects to output buffer (default: false)
- **custom\_properties**: (default: [ ] )
- **required\_parameters**: (default: [ ] )

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        }  
      ]  
    }  
  ]  
}
```

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format:** defines the total, type, opcode and slot bit width

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format:** defines the total, type, opcode and slot bit width
- **instructions:** list of instructions for the component

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format:** defines the total, type, opcode and slot bit width
- **instructions:** list of instructions for the component
- **name:** name of the instruction

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format:** defines the total, type, opcode and slot bit width
- **instructions:** list of instructions for the component
  - **name:** name of the instruction
  - **opcode** of the instruction

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format**: defines the total, type, opcode and slot bit width
- **instructions**: list of instructions for the component
  - **name**: name of the instruction
  - **opcode** of the instruction
  - **segments** (i.e., fields):

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format**: defines the total, type, opcode and slot bit width
- **instructions**: list of instructions for the component
  - **name**: name of the instruction
  - **opcode** of the instruction
  - **segments** (i.e., fields):
    - **name** of the field

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format**: defines the total, type, opcode and slot bit width
- **instructions**: list of instructions for the component
  - **name**: name of the instruction
  - **opcode** of the instruction
  - **segments** (i.e., fields):
    - **name** of the field
    - **comment** (description of the field)

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format**: defines the total, type, opcode and slot bit width
- **instructions**: list of instructions for the component
  - **name**: name of the instruction
  - **opcode** of the instruction
  - **segments** (i.e., fields):
    - **name** of the field
    - **comment** (description of the field)
    - **bitwidth** of the field

# 2. Instruction set description

## isa.json

```
{  
  "format": {  
    "instr_bitwidth": 32,  
    "instr_type_bitwidth": 1,  
    "instr_opcode_bitwidth": 3,  
    "instr_slot_bitwidth": 4  
  },  
  "instructions": [  
    {  
      "name": "dpu",  
      "opcode": 3,  
      "instr_type": 1,  
      "segments": [  
        {  
          "name": "option",  
          "comment": "Configuration option  
          "bitwidth": 2  
        },  
        {  
          "name": "mode",  
          "comment": "DPU mode",  
          "bitwidth": 5,  
          "verbo_map": [  
            {  
              "key": 0,  
              "val": "idle"  
            },  
            {  
              "key": 1,  
              "val": "add"  
            },  
            {  
              "key": 2,  
              "val": "sum_acc"  
            },  
            {  
              "key": 3,  
              "val": "add_const"  
            },  
            {  
              "key": 4,  
              "val": "subt"  
            }  
          ]  
        ]  
      ]  
    }  
  ]  
}
```

- **format**: defines the total, type, opcode and slot bit width
- **instructions**: list of instructions for the component
  - **name**: name of the instruction
  - **opcode** of the instruction
  - **segments** (i.e., fields):
    - **name** of the field
    - **comment** (description of the field)
    - **bitwidth** of the field
    - **verbo\_map**: string mapping for each value of the field

# 3. Generating the file structure

```
vesyla component create -a arch.json -i isa.json -o ./dpu_test
```

```
dpu_test
├── Bender.yml
├── CMakeLists.txt
├── compile_util
│   └── Cargo.toml
└── src
    └── main.rs
rtl
└── dpu_pkg.sv.j2
    └── dpu.sv.j2
sst
└── CMakeLists.txt
    ├── dpu.cpp
    ├── dpu.h
    ├── dpu_pkg.cpp
    └── dpu_pkg.h
tb
└── component_tb.sv
```

# 3. Generating the file structure

```
vesyla component create -a arch.json -i isa.json -o ./dpu_test
```



# 3. Generating the file structure

```
vesyla component create -a arch.json -i isa.json -o ./dpu_test
```



# 3. Generating the file structure

```
vesyla component create -a arch.json -i isa.json -o ./dpu_test
```



# 4. Implement the simulation model

## SST

```
sst
├── CMakeLists.txt
├── dpu.cpp
├── dpu.h
└── dpu_pkg.cpp
    dpu_pkg.h
```

Inside the SST folder, you get:

- **CMakeLists.txt**: used to build the C++ code
- **dpu\_pkg.cpp & dpu\_pkg.h**: contain ISA definitions
- **dpu.cpp** and **dpu.h**: here, you implement the functionality

```
void Dpu::handleDPU(const DPU_PKG::DPUInstruction &instr) {
    out.output(
        "dpu (slot=%d, option=%d, mode=%d, immediate=%d)\n",
        instr.slot, instr.option, instr.mode, instr.immediate);

    // TODO: implement instruction behavior
}

void Dpu::handleREP(const DPU_PKG::REPIInstruction &instr) {
    out.output(
        "rep (slot=%d, port=%d, level=%d, iter=%d, step=%d, delay=%d)\n",
        instr.slot, instr.port, instr.level, instr.iter, instr.step, instr.delay);

    // TODO: implement instruction behavior
}

void Dpu::handleREPX(const DPU_PKG::REPXInstruction &instr) {
    out.output(
        "repx (slot=%d, port=%d, level=%d, iter=%d, step=%d, delay=%d)\n",
        instr.slot, instr.port, instr.level, instr.iter, instr.step, instr.delay);

    // TODO: implement instruction behavior
}
```

```

void Dpu::handleDPU(const DPU_PKG::DPUInstruction &instr) {
    out.output(
        "dpu (slot=%d, option=%d, mode=%d, immediate=%d)\n",
        instr.slot, instr.option, instr.mode, instr.immediate);

    // TODO: implement instruction behavior
}

void Dpu::handleREP(const DPU_PKG::REPIInstruction &instr) {
    out.output(
        "rep (slot=%d, port=%d, level=%d, iter=%d, step=%d, delay=%d)\n",
        instr.slot, instr.port, instr.level, instr.iter, instr.step, instr.delay);

    // TODO: implement instruction behavior
}

void Dpu::handleREPX(const DPU_PKG::REPXInstruction &instr) {
    out.output(
        "repx (slot=%d, port=%d, level=%d, iter=%d, step=%d, delay=%d)\n",
        instr.slot, instr.port, instr.level, instr.iter, instr.step, instr.delay);

    // TODO: implement instruction behavior
}

void Dpu::handleFSM(const DPU_PKG::FSMInstruction &instr) {
    out.output(
        "fsm (slot=%d, port=%d, delay_0=%d, delay_1=%d, delay_2=%d)\n",
        instr.slot, instr.port, instr.delay_0, instr.delay_1, instr.delay_2);

    // TODO: implement instruction behavior
}

```

# 5. Implement the timing model

## Rust script

The Vesyla compiler needs to know  
***how your instructions affect the timing model of a program***

# 5. Implement the timing model

## Rust script

The Vesyla compiler needs to know  
***how your instructions affect the timing model of a program***

Example: issuing a “rep” and “fsm” instructions for the DPU

```
cell {  
    dpu(mode="add_const", immediate=1)  
    rep(level=0, iter=29, step=1, delay=1)  
    dpu(mode="div")  
    fsm(delay_0=4)  
}
```

# 5. Implement the timing model

## Rust script

The Vesyla compiler needs to know  
***how your instructions affect the timing model of a program***

Example: issuing a “rep” and “fsm” instructions for the DPU

```
cell {  
    dpu(mode="add_const", immediate=1)  
    rep(level=0, iter=29, step=1, delay=1)          e0  
    dpu(mode="div")  
    fsm(delay_0=4)  
}
```

# 5. Implement the timing model

## Rust script

The Vesyla compiler needs to know  
***how your instructions affect the timing model of a program***

Example: issuing a “rep” and “fsm” instructions for the DPU

```
cell {  
    dpu(mode="add_const", immediate=1)  
    rep(level=0, iter=29, step=1, delay=1)           R<30,1>( e0 )  
    dpu(mode="div")  
    fsm(delay_0=4)  
}
```

# 5. Implement the timing model

## Rust script

The Vesyla compiler needs to know  
***how your instructions affect the timing model of a program***

Example: issuing a “rep” and “fsm” instructions for the DPU

```
cell {  
    dpu(mode="add_const", immediate=1)  
    rep(level=0, iter=29, step=1, delay=1)           R<30,1>( e0 )  e1  
    dpu(mode="div")  
    fsm(delay_0=4)  
}
```

# 5. Implement the timing model

## Rust script

The Vesyla compiler needs to know  
***how your instructions affect the timing model of a program***

Example: issuing a “rep” and “fsm” instructions for the DPU

```
cell {  
    dpu(mode="add_const", immediate=1)  
    rep(level=0, iter=29, step=1, delay=1)      T<4>( R<30,1>( e0 ), e1 )  
    dpu(mode="div")  
    fsm(delay_0=4)  
}
```

# 5. Implement the timing model

## Rust script

So, we need a script that the compiler can use to get this timing model from your component instruction set

Generated template script

```
fn get_timing_model(op: Op) -> String {
    let mut t: HashMap<i64, String> = HashMap::new();
    let mut r: HashMap<i64, (String, String)> = HashMap::new();
    let mut expr = "e0".to_string();

    for instr in op.body {
        let instr_segments = instr.params;
        match instr.kind.as_str() {
            "dpu" => {
                let mut option = instr_segments.get_value("option");
                let mut mode = instr_segments.get_value("mode");
                let mut immediate = instr_segments.get_value("immediate");
                todo!("Implement timing model for instruction dpu");
            },
            "rep" => {
                let mut port = instr_segments.get_value("port");
                let mut level = instr_segments.get_value("level");
                let mut iter = instr_segments.get_value("iter");
                let mut step = instr_segments.get_value("step");
                let mut delay = instr_segments.get_value("delay");
                todo!("Implement timing model for instruction rep");
            },
            "repx" => {
                let mut port = instr_segments.get_value("port");
                let mut level = instr_segments.get_value("level");
                let mut iter = instr_segments.get_value("iter");
                let mut step = instr_segments.get_value("step");
                let mut delay = instr_segments.get_value("delay");
                todo!("Implement timing model for instruction repx");
            },
            "fsm" => {
                let mut port = instr_segments.get_value("port");
                let mut delay_0 = instr_segments.get_value("delay_0");
                let mut delay_1 = instr_segments.get_value("delay_1");
                let mut delay_2 = instr_segments.get_value("delay_2");
                todo!("Implement timing model for instruction fsm");
            },
            _ => {
                panic!("Unknown instruction kind: {}", instr.kind);
            }
        }
    }

    r.insert(0, "0".to_string());
    r.insert(1, "0".to_string());
    r.insert(2, "0".to_string());
    t.insert(0, expr);
    t.insert(1, expr);
    t.insert(2, expr);

    return r
}
```

Completed script

```
fn get_timing_model(op: Op) -> String {
    let mut t: HashMap<i64, String> = HashMap::new();
    let mut r: HashMap<i64, (String, String)> = HashMap::new();
    let mut expr = "e0".to_string();

    for instr in op.body {
        let instr_segments = instr.params;
        match instr.kind.as_str() {
            "dpu" => {}
            "rep" => {
                let _port = instr_segments.get_value("port");
                let level = instr_segments.get_value("level");
                let mut iter = instr_segments.get_value("iter");
                let _step = instr_segments.get_value("step");
                let delay = instr_segments.get_value("delay");
                iter = (iter.parse::<i64>().unwrap() + 1).to_string();
                r.insert(
                    level.parse::<i64>().unwrap(),
                    (iter.to_string(), delay.to_string()),
                );
            }
            "repx" => {}
            "fsm" => {
                let _port = instr_segments.get_value("port");
                let delay_0 = instr_segments.get_value("delay_0");
                let delay_1 = instr_segments.get_value("delay_1");
                let delay_2 = instr_segments.get_value("delay_2");

                t.insert(0, "0".to_string());
                t.insert(1, "0".to_string());
                t.insert(2, "0".to_string());
                t.insert(0, delay_0);
                t.insert(1, delay_1);
                t.insert(2, delay_2);
            }
            _ => {
                panic!("Unknown instruction kind: {}", instr.kind);
            }
        }
    }

    return r
}
```

# 6. Implement the RTL design

## SystemVerilog / Jinja template

```
rtl
└── dpu_pkg.sv.j2
└── dpu.sv.j2
```

Inside the RTL folder, you get:

- `dpu_pkg.sv.j2`: contain instruction pack/unpack functions and custom properties declared in `arch.json`
- `dpu.sv.j2`: here, you implement the RTL design

# 6. Implement the RTL design

## SystemVerilog / Ninja template

```
rtl
└── dpu_pkg.sv.j2
└── dpu.sv.j2
```

Inside the RTL folder, you get:

- `dpu_pkg.sv.j2`: contain instruction pack/unpack functions and custom properties declared in `arch.json`
- `dpu.sv.j2`: here, you implement the RTL design

`dpu.sv.j2`

```
// vesyla_template_start defines
`define {{name}} {{fingerprint}}
`define {{name}}_pkg {{fingerprint}}_pkg
// vesyla_template_end defines

// vesyla_template_start module_head
{%
  if not already_defined %}
  module {{fingerprint}}
    import {{fingerprint}}_pkg::*;
// vesyla_template_end module_head
(
  input logic clk_0,
  input logic rst_n_0,
  input logic instr_en_0,
  input logic [RESOURCE_INSTR_WIDTH-1:0] instr_0,
  input logic [3:0] activate_0,
  input logic [WORD_BITWIDTH-1:0] word_data_in_0,
  output logic [WORD_BITWIDTH-1:0] word_data_out_0,
  input logic [BULK_BITWIDTH-1:0] bulk_data_in_0,
  output logic [BULK_BITWIDTH-1:0] bulk_data_out_0,
  input logic clk_1,
  input logic rst_n_1,
  input logic instr_en_1,
  input logic [RESOURCE_INSTR_WIDTH-1:0] instr_1,
  input logic [3:0] activate_1,
```

Some considerations:

- module interface is already declared to fit DRRA slots
- inside the module, you have complete SystemVerilog functionality
- you should not edit the interface or the `jinja` tags
  - `{} ... {}` and `{% ... %}`

**Congratulations!**  
**You have a new component!**

# **Advancement status**

**What do we have? What are we missing?**

# **Advancement status**

**What do we have? What are we missing?**

**What we have**

# **Advancement status**

**What do we have? What are we missing?**

**What we have**

- Reference implementation

# Advancement status

**What do we have? What are we missing?**

What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs

# Advancement status

## What do we have? What are we missing?

### What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs
- Memory layout mapping for each algorithm

# Advancement status

## What do we have? What are we missing?

### What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs
- Memory layout mapping for each algorithm
- High-level modeling of each algorithm (model 0)

# Advancement status

## What do we have? What are we missing?

### What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs
- Memory layout mapping for each algorithm
- High-level modeling of each algorithm (model 0)
- DRRA fabric description (JSON file listing cells, resources, etc.)

# Advancement status

## What do we have? What are we missing?

### What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs
- Memory layout mapping for each algorithm
- High-level modeling of each algorithm (model 0)
- DRRA fabric description (JSON file listing cells, resources, etc.)

### What are missing

# Advancement status

## What do we have? What are we missing?

### What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs
- Memory layout mapping for each algorithm
- High-level modeling of each algorithm (model 0)
- DRRA fabric description (JSON file listing cells, resources, etc.)

### What are missing

- Program to run the algorithms on the fabric

# Advancement status

## What do we have? What are we missing?

### What we have

- Reference implementation
- Partitioned into algorithms with clearly defined inputs/outputs
- Memory layout mapping for each algorithm
- High-level modeling of each algorithm (model 0)
- DRRA fabric description (JSON file listing cells, resources, etc.)

### What are missing

- Program to run the algorithms on the fabric
- Test results to check functionality and timing

# **How to program a DRRA fabric?**

## **Mapping - Assembly implementation**

# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)
  - **DRRA-specific** language

# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)
  - **DRRA-specific** language
    - Only one online tutorial

# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)
  - **DRRA-specific** language
    - Only one online tutorial
    - ChatGPT is clueless

# **How to program a DRRA fabric?**

## **Mapping - Assembly implementation**

# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)
  - **DRRA-specific** language
  - Composable instruction set (**CIS**)

# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)
  - DRRA-specific language
  - Composable instruction set (**CIS**)
    - The instructions you can use depend on your fabric
    - Our example fabric has instructions for IOSRAM, RegisterFile, DPU and Switchbox



# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)

- DRRA-specific language
- Composable instruction set (**CIS**)
- Resource-centric  
(vs. computation-centric)

$$c = a + b$$



```
SWB {
    connect RF1 port 0 to IOSRAM port Z1
    connect RF2 port 0 to IOSRAM port Z2
    connect DPU port 0 to RF1 port 1
    connect DPU port 1 to RF2 port 1
    connect DPU port 2 to RF3 port 0
    connect RF3 port 1 to IOSRAM2 port X
}

IOSRAM1 {
    read a from input buffer at address X
    write a to own sram at address Y
    read b from input buffer at address X+1
    write b to own sram at address Y+1
    read a from own sram to port Z1
    read b from own sram to port Z2
}

RF1 {           RF2 {
    read a from port 0   read b from port 0
    write a to port 1    write b to port 1
}
}

DPU {           RF3 {
    do addition        read c from port 0
}                   write c to port 1
}

IOSRAM1 {
    read c from port X
    write c to own sram at address Y
    read c from own sram to output port
    write c to output buffer
}
```

# How to program a DRRA fabric?

## Mapping - Assembly implementation

- DRRA programs are written in proto-assembly (PASM)
  - **DRRA-specific** language
  - Composable instruction set (**CIS**)
  - **Resource-centric**
  - Timing is determined via **constraints**
    - You have to manually declare the timing constraints of the operations
  - Example:

```
IOSRAM1 read a before write a to RF1
IOSRAM1 read b before write b to RF2
connect RF1 to DPU before read a
connect RF2 to DPU before read b
connect RF3 to DPU before compute
...
```

# Complete example

```
epoch {
    rop <route0r> (row=0, col=0, slot=0, port=2) {
        route (option=0, sr=0, source=2, target= 0b010000000)
    }
    rop <input_r> (row=0, col=0, slot=1, port=0) {
        dsu (init_addr=0)
        rep (level=0, iter=1, step=2, delay=0)
        rep (level=1, iter=1, step=1, delay=0)
    }
    rop <input_w> (row=0, col=0, slot=1, port=2) {
        dsu (init_addr=0)
        rep (iter=3, step=1, delay=0)
    }
    rop <read_ab> (row=0, col=0, slot=2, port=3) {
        dsu (init_addr=0)
        rep (iter=3, step=1, delay=0)
    }

    rop <routelwr> (row=1, col=0, slot=0, port=2) {
        route (option=0, sr=1, source=1, target= 0b000000000000000110)
    }
    rop <write_a> (row=1, col=0, slot=1, port=2) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=t1)
    }
    rop <write_b> (row=1, col=0, slot=2, port=2) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=t1)
    }
    rop <srb> (row=1, col=0, slot=0, port=0) {
        swb (option=0, channel=4, source=1, target=4)
        swb (option=0, channel=5, source=2, target=5)
        swb (option=0, channel=3, source=4, target=3)
    }
    rop <read_a_seq> (row=1, col=0, slot=1, port=1) {
        dsu (init_addr=0)
        rep (iter=31, step=1, delay=0)
    }
    rop <read_b_seq> (row=1, col=0, slot=2, port=1) {
        dsu (init_addr=0)
        rep (iter=31, step=1, delay=0)
    }
    rop <write_c_seq> (row=1, col=0, slot=3, port=0) {
        dsu (init_addr=0)
        rep (iter=31, step=1, delay=0)
    }
    rop <compute> (row=1, col=0, slot=4, port=0) {
        dpu (mode=7)
    }
    rop <read_c> (row=1, col=0, slot=3, port=3) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=0)
    }

    rop <route2w> (row=2, col=0, slot=0, port=2) {
        route (option=0, sr=1, source=1, target= 0b000000000000000100)
    }
    rop <write_c> (row=2, col=0, slot=2, port=2) {
        dsu (init_addr=0 )
        rep (iter=1, step=1, delay=0)
    }
    rop <output_r> (row=2, col=0, slot=1, port=3) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=0)
    }
    rop <output_w> (row=2, col=0, slot=1, port=1) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=0)
    }

    cstr ("input_r == input_w")
    cstr ("route0r < read_ab")
    cstr ("routelwr < write_a")
    cstr ("routelwr < write_b")
    cstr ("read_ab > input_w")
    cstr ("read_ab.e0[0] == write_a.e0[0]")
    cstr ("read_ab.e0[1] == write_b.e0[0]")
    cstr ("read_ab.e0[2] == write_a.e0[1]")
    cstr ("read_ab.e0[3] == write_b.e0[1]")
    cstr ("write_a < read_a_seq")
    cstr ("write_b < read_b_seq")
    cstr ("swb < read_a_seq")
    cstr ("read_a_seq == read_b_seq")
    cstr ("read_a_seq + 1 > compute")
    cstr ("write_c_seq == read_a_seq + 1")
    cstr ("read_c.e0[0] > write_c_seq.e0[15]")
    cstr ("read_c.e0[1] > write_c_seq.e0[31]")
    cstr ("write_c == read_c")
    cstr ("output_r > write_c")
    cstr ("output_r == output_w")
}
```

# Complete example

- epoch: block of code where operations and constraints are declared
  - epochs execute sequentially

```
epoch {
    rop <route0r> (row=0, col=0, slot=0, port=2) {
        route (option=0, sr=0, source=2, target= 0b010000000)
    }
    rop <input_r> (row=0, col=0, slot=1, port=0) {
        dsu (init_addr=0)
        rep (level=0, iter=1, step=2, delay=0)
        rep (level=1, iter=1, step=1, delay=0)
    }
    rop <input_w> (row=0, col=0, slot=1, port=2) {
        dsu (init_addr=0)
        rep (iter=3, step=1, delay=0)
    }
    rop <read_ab> (row=0, col=0, slot=2, port=3) {
        dsu (init_addr=0)
        rep (iter=3, step=1, delay=0)
    }

    rop <routelwr> (row=1, col=0, slot=0, port=2) {
        route (option=0, sr=1, source=1, target= 0b000000000000000110)
    }
    rop <write_a> (row=1, col=0, slot=1, port=2) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=t1)
    }
    rop <write_b> (row=1, col=0, slot=2, port=2) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=t1)
    }
    rop <swb> (row=1, col=0, slot=0, port=0) {
        swb (option=0, channel=4, source=1, target=4)
        swb (option=0, channel=5, source=2, target=5)
        swb (option=0, channel=3, source=4, target=3)
    }
    rop <read_a_seq> (row=1, col=0, slot=1, port=1) {
        dsu (init_addr=0)
        rep (iter=31, step=1, delay=0)
    }
    rop <read_b_seq> (row=1, col=0, slot=2, port=1) {
        dsu (init_addr=0)
        rep (iter=31, step=1, delay=0)
    }
    rop <write_c_seq> (row=1, col=0, slot=3, port=0) {
        dsu (init_addr=0)
        rep (iter=31, step=1, delay=0)
    }
    rop <compute> (row=1, col=0, slot=4, port=0) {
        dpu (mode=7)
    }
    rop <read_c> (row=1, col=0, slot=3, port=3) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=0)
    }

    rop <route2w> (row=2, col=0, slot=0, port=2) {
        route (option=0, sr=1, source=1, target= 0b000000000000000100)
    }
    rop <write_c> (row=2, col=0, slot=2, port=2) {
        dsu (init_addr=0 )
        rep (iter=1, step=1, delay=0)
    }
    rop <output_r> (row=2, col=0, slot=1, port=3) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=0)
    }
    rop <output_w> (row=2, col=0, slot=1, port=1) {
        dsu (init_addr=0)
        rep (iter=1, step=1, delay=0)
    }

    cstr ("input_r == input_w")
    cstr ("route0r < read_ab")
    cstr ("routelwr < write_a")
    cstr ("routelwr < write_b")
    cstr ("read_ab > input_w")
    cstr ("read_ab.e0[0] == write_a.e0[0]")
    cstr ("read_ab.e0[1] == write_b.e0[0]")
    cstr ("read_ab.e0[2] == write_a.e0[1]")
    cstr ("read_ab.e0[3] == write_b.e0[1]")
    cstr ("write_a < read_a_seq")
    cstr ("write_b < read_b_seq")
    cstr ("swb < read_a_seq")
    cstr ("read_a_seq == read_b_seq")
    cstr ("read_a_seq + 1 > compute")
    cstr ("write_c_seq == read_a_seq + 1")
    cstr ("read_c.e0[0] > write_c_seq.e0[15]")
    cstr ("read_c.e0[1] > write_c_seq.e0[31]")
    cstr ("write_c == read_c")
    cstr ("output_r > write_c")
    cstr ("output_r == output_w")
}
```

# Complete example

- epoch: block of code where operations and constraints are declared
  - epochs execute sequentially
- “rop” or “cop”: resource or control operations
  - they target a specific port of a specific component
  - they can contains multiple instructions

```
epoch {  
    rop <route0r> (row=0, col=0, slot=0, port=2) {  
        route (option=0, sr=0, source=2, target= 0b010000000)  
    }  
    rop <input_r> (row=0, col=0, slot=1, port=0) {  
        dsu (init_addr=0)  
        rep (level=0, iter=1, step=2, delay=0)  
        rep (level=1, iter=1, step=1, delay=0)  
    }  
    rop <input_w> (row=0, col=0, slot=1, port=2) {  
        dsu (init_addr=0)  
        rep (iter=3, step=1, delay=0)  
    }  
    rop <read_ab> (row=0, col=0, slot=2, port=3) {  
        dsu (init_addr=0)  
        rep (iter=3, step=1, delay=0)  
    }  
  
    rop <routelwr> (row=1, col=0, slot=0, port=2) {  
        route (option=0, sr=1, source=1, target= 0b000000000000000110)  
    }  
    rop <write_a> (row=1, col=0, slot=1, port=2) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=t1)  
    }  
    rop <write_b> (row=1, col=0, slot=2, port=2) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=t1)  
    }  
    rop <swb> (row=1, col=0, slot=0, port=0) {  
        swb (option=0, channel=4, source=1, target=4)  
        swb (option=0, channel=5, source=2, target=5)  
        swb (option=0, channel=3, source=4, target=3)  
    }  
    rop <read_a_seq> (row=1, col=0, slot=1, port=1) {  
        dsu (init_addr=0)  
        rep (iter=31, step=1, delay=0)  
    }  
    rop <read_b_seq> (row=1, col=0, slot=2, port=1) {  
        dsu (init_addr=0)  
        rep (iter=31, step=1, delay=0)  
    }  
    rop <write_c_seq> (row=1, col=0, slot=3, port=0) {  
        dsu (init_addr=0)  
        rep (iter=31, step=1, delay=0)  
    }  
    rop <compute> (row=1, col=0, slot=4, port=0) {  
        dpu (mode=7)  
    }  
    rop <read_c> (row=1, col=0, slot=3, port=3) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
  
    rop <route2w> (row=2, col=0, slot=0, port=2) {  
        route (option=0, sr=1, source=1, target= 0b000000000000000100)  
    }  
    rop <write_c> (row=2, col=0, slot=2, port=2) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
    rop <output_r> (row=2, col=0, slot=1, port=3) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
    rop <output_w> (row=2, col=0, slot=1, port=1) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
  
    cstr ("input_r == input_w")  
    cstr ("route0r < read_ab")  
    cstr ("routelwr < write_a")  
    cstr ("routelwr < write_b")  
    cstr ("read_ab > input_w")  
    cstr ("read_ab.e0[0] == write_a.e0[0]")  
    cstr ("read_ab.e0[1] == write_b.e0[0]")  
    cstr ("read_ab.e0[2] == write_a.e0[1]")  
    cstr ("read_ab.e0[3] == write_b.e0[1]")  
    cstr ("write_a < read_a_seq")  
    cstr ("write_b < read_b_seq")  
    cstr ("swb < read_a_seq")  
    cstr ("read_a_seq == read_b_seq")  
    cstr ("read_a_seq + 1 > compute")  
    cstr ("write_c_seq == read_a_seq + 1")  
    cstr ("read_c.e0[0] > write_c_seq.e0[15]")  
    cstr ("read_c.e0[1] > write_c_seq.e0[31]")  
    cstr ("write_c == read_c")  
    cstr ("output_r > write_c")  
    cstr ("output_r == output_w")  
}
```

# Complete example

- epoch: block of code where operations and constraints are declared
  - epochs execute sequentially
- “rop” or “cop”: resource or control operations
  - they target a specific port of a specific component
  - they can contains multiple instructions
- constraints: declare the timing constraints between operations
  - uses the operations names
  - uses conditional logic to build timing relations
  - Example: `cstr("route0r < read_ab")`  
means that `route0r` is executed before `read_ab`

```
epoch {  
    rop <route0r> (row=0, col=0, slot=0, port=2) {  
        route (option=0, sr=0, source=2, target= 0b010000000)  
    }  
    rop <input_r> (row=0, col=0, slot=1, port=0) {  
        dsu (init_addr=0)  
        rep (level=0, iter=1, step=2, delay=0)  
        rep (level=1, iter=1, step=1, delay=0)  
    }  
    rop <input_w> (row=0, col=0, slot=1, port=2) {  
        dsu (init_addr=0)  
        rep (iter=3, step=1, delay=0)  
    }  
    rop <read_ab> (row=0, col=0, slot=2, port=3) {  
        dsu (init_addr=0)  
        rep (iter=3, step=1, delay=0)  
    }  
  
    rop <routelwr> (row=1, col=0, slot=0, port=2) {  
        route (option=0, sr=1, source=1, target= 0b000000000000000110)  
    }  
    rop <write_a> (row=1, col=0, slot=1, port=2) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=t1)  
    }  
    rop <write_b> (row=1, col=0, slot=2, port=2) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=t1)  
    }  
    rop <swb> (row=1, col=0, slot=0, port=0) {  
        swb (option=0, channel=4, source=1, target=4)  
        swb (option=0, channel=5, source=2, target=5)  
        swb (option=0, channel=3, source=4, target=3)  
    }  
    rop <read_a_seq> (row=1, col=0, slot=1, port=1) {  
        dsu (init_addr=0)  
        rep (iter=31, step=1, delay=0)  
    }  
    rop <read_b_seq> (row=1, col=0, slot=2, port=1) {  
        dsu (init_addr=0)  
        rep (iter=31, step=1, delay=0)  
    }  
    rop <write_c_seq> (row=1, col=0, slot=3, port=0) {  
        dsu (init_addr=0)  
        rep (iter=31, step=1, delay=0)  
    }  
    rop <compute> (row=1, col=0, slot=4, port=0) {  
        dpu (mode=7)  
    }  
    rop <read_c> (row=1, col=0, slot=3, port=3) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
  
    rop <route2w> (row=2, col=0, slot=0, port=2) {  
        route (option=0, sr=1, source=1, target= 0b000000000000000100)  
    }  
    rop <write_c> (row=2, col=0, slot=2, port=2) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
    rop <output_r> (row=2, col=0, slot=1, port=3) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
    rop <output_w> (row=2, col=0, slot=1, port=1) {  
        dsu (init_addr=0)  
        rep (iter=1, step=1, delay=0)  
    }  
  
    cstr ("input_r == input_w")  
    cstr ("route0r < read_ab")  
    cstr ("routelwr < write_a")  
    cstr ("route1wr < write_b")  
    cstr ("read_ab > input_w")  
    cstr ("read_ab.e0[0] == write_a.e0[0]")  
    cstr ("read_ab.e0[1] == write_b.e0[0]")  
    cstr ("read_ab.e0[2] == write_a.e0[1]")  
    cstr ("read_ab.e0[3] == write_b.e0[1]")  
    cstr ("write_a < read_a_seq")  
    cstr ("write_b < read_b_seq")  
    cstr ("swb < read_a_seq")  
    cstr ("read_a_seq == read_b_seq")  
    cstr ("read_a_seq + 1 > compute")  
    cstr ("write_c_seq == read_a_seq + 1")  
    cstr ("read_c.e0[0] > write_c_seq.e0[15]")  
    cstr ("read_c.e0[1] > write_c_seq.e0[31]")  
    cstr ("write_c == read_c")  
    cstr ("output_r > write_c")  
    cstr ("output_r == output_w")  
}
```

# How to test my implementation?

## Mapping - Testing

```
mul_32_3_1/
├── arch.json
├── intro.md
└── model_0
    └── main.cpp
└── pasm
    └── 0.pasm
└── README.md
```

# How to test my implementation?

## Mapping - Testing

1. Compose a test folder

```
mul_32_3_1/
├── arch.json
├── intro.md
└── model_0
    └── main.cpp
└── pasm
    └── 0.pasm
└── README.md
```

# How to test my implementation?

## Mapping - Testing

### 1. Compose a test folder

- **folder name** is the test name



# How to test my implementation?

## Mapping - Testing

### 1. Compose a test folder

- **folder name** is the test name
- **fabric description** JSON file



# How to test my implementation?

## Mapping - Testing

### 1. Compose a test folder

- **folder name** is the test name
- **fabric description** JSON file
- **model 0** program



# How to test my implementation?

## Mapping - Testing

### 1. Compose a test folder

- **folder name** is the test name
- **fabric description** JSON file
- **model 0** program
- **PASM** program



# How to test my implementation?

## Mapping - Testing

```
mul_32_3_1/
├── arch.json
└── intro.md
└── model_0
    └── main.cpp
└── pasm
    └── 0.pasm
```

```
paul-OptiPlex-Tower-Plus-7020: .../testcases/drra/arithmetic
kth drra/arithmetic (master) $ vesyla testcase run -d ./mul_512_3_1 -o mul_512_3_1_test|
```

# How to test my implementation?

## Mapping - Testing

- ▶ Compose a test folder

```
mul_32_3_1/
├── arch.json
└── intro.md
└── model_0
    └── main.cpp
└── pasm
    └── 0.pasm
```

```
paul-OptiPlex-Tower-Plus-7020: .../testcases/drra/arithmetic
kth drra/arithmetic (master) $ vesyla testcase run -d ./mul_512_3_1 -o mul_512_3_1_test|
```

# How to test my implementation?

## Mapping - Testing

- ▶ Compose a test folder
- ▶ Run the test
  - ▶ `vesyla testcase run  
-d ./my_test_folder  
-o ./output_test`

```
mul_32_3_1/  
└── arch.json  
└── intro.md  
└── model_0  
    └── main.cpp  
└── pasm  
    └── 0.pasm
```

```
paul-OptiPlex-Tower-Plus-7020: .../testcases/drra/arithmetic  
kth drra/arithmetic (master) $ vesyla testcase run -d ./mul_512_3_1 -o mul_512_3_1_test|
```



# **Final thoughts**

# Use Git

Make my life easier 😔 and yours too...



<https://github.com/silagokth>

 **KTH Silago Team**

6 followers   Sweden   <https://silago.eecs.kth.se>

---

Pinned Customize pins

 **vesyla** Public  
Tool suite for DRRA-2 hardware accelerator  
C++ ⭐ 3 ⚡ 1

 **drra-components** Private  
DRRA Component Library  
C++ ⭐ 1

 **drra-tests** Private  
Test cases for DRRA-2 and other platforms  
Parrot

**Check for vesyla updates here** **Add new components here** **Add your tests here**

**Ask me if you do not have access**

# Use Git

Make my life easier 😔 and yours too...



<https://github.com/silagokth>

 **KTH Silago Team**

6 followers   Sweden   <https://silago.eecs.kth.se>

---

Pinned Customize pins

 **vesyla** Public

Tool suite for DRRA-2 hardware accelerator

• C++   ★ 3   ♫ 1

**Check for vesyla updates here**

 **drra-components** Private

DRRA Component Library

• C++   ★ 1

**Add new components here**

 **drra-tests** Private

Test cases for DRRA-2 and other platforms

• Parrot

**Add your tests here**

**Ask me if you do not have access**

# Use Git

## For drra-tests specifically...

- In testcases/drqa folder
- Use a subfolder for your project
  - testcases/drqa/icp
  - testcases/drqa/resnet
- Create one testcase for each algorithm in the application

```
testcases/
└── drqa/
    ├── arithmetic/
    │   ├── mul_32_1_1/
    │   ├── mul_32_3_1/
    │   └── mul_512_3_1/
    └── fpu/
        ├── dpu16/
        └── fpu16/
```

# Time distribution

How long should each step take?

*Without needing new components*



*If new components are needed*



- Reference implementation
- Memory Layout
- Binding

- Application partitioning
- model 0
- PASM

- Reference implementation
- Memory Layout
- Binding

- Application partitioning
- model 0
- PASM