

# Minion overview

- L1 Data Cache ([L1D\\$](#)) - 4KB configurable
  - 16 sets, 4 ways, 64 Bytes cache line
  - Non-coherent
  - Configuration through `mcache_control` CSR
  - Configurable [modes](#):



Shared



Split



SCP

- Some logic implemented in MMI (hand-tuned)

## ET-Minion

- RV64IMFC + Zicsr + Zifencei
- In-Order execution with 2 Hardware Threads



# Minion overview

- Vector Processing Unit ([VPU](#))
  - 8 lanes, 32 bits per lane
  - 1 FMA unit, 2 IMA unit, 1 INT unit, 1 TRANS unit
  - Extends the RISC-V FP registers from 32b to 256b
  - Each FP register is like a vector with **8** elements of **32b** each
  - Related [custom extensions](#):
    - Packed Single (**PS**) → FP32, memory and compute operations
    - Packed Integer (**PI**) → Int32/ulnt32, compute operations
    - Atomic (**AMO**) → PS and PI AMO operations
    - **Tensor** → Memory, FMA32, FMA16A32, IMA8A32, QUANT, REDUCE operations
  - Some logic implemented in MMI (hand-tuned)

## ET-Minion

- RV64IMFC + Zicsr + Zifencei
- In-Order execution with 2 Hardware Threads



# Neigh overview

- 8 ET-Minions
- L1 Instruction cache
  - Shared among the 8 ET-Minions
  - 128 sets, 4 way (set associative)
  - 64Byte cache line
- Micro Instruction cache
  - Shared among 4 ET-Minions
  - 1 set, 16 ways (fully associative)
- Cooperative Tensor Load
  - Coalesces multiple Tensor Load requests to the same memory location (from different minions) into a single request to provide better performance and power.

## Neighborhood

- 8 ET-Minion



# Minion Shire overview

- 4 Minion Neighborhoods
- Shire Cache (SC) - 4MB configurable:
  - 4 ways set-associative, 64Byte cache lines
  - Typical mode: L3 1MB, L2 512KB, SCP 2.5MB
- Uncacheable Block (UC)
  - ET Fast Local Barrier
  - ET Fast Credit Counter
  - ET Inter Processor Interrupts
  - ET Global Atomics
- Cooperative stores (coalescing buffer)

## Minion Shire

- 4 Minion Neighs
- L2 / L3 Cache
- L2 SCP

