

# Table of Contents

## Bridge Index

**Generated:** 2025-11-23

## Bridge: CSV-Based AXI4 Crossbar Generator - Specification

**Component:** Bridge - CSV-Based AXI4 Crossbar Generator **Version:** 0.90 **Last**

**Updated:** 2025-11-14 **Status:** Phase 2+ Complete - Bridge ID Tracking Fully Implemented

---

## Document Organization

This specification documents the Bridge CSV-based AXI4 crossbar generator, which creates parameterized SystemVerilog crossbars from human-readable CSV configuration files.

### Chapter 0: Quick Start

**Location:** ch00\_quick\_start/

- [01\\_installation.md](#) - Installation and setup
- [02\\_first\\_bridge.md](#) - Creating your first bridge
- [03\\_simulation.md](#) - Simulating the bridge
- [04\\_integration.md](#) - Integration into designs

### Chapter 1: Overview

**Location:** ch01\_overview/

- [01\\_overview.md](#) - What is Bridge, key features, and applications
- [02\\_architecture.md](#) - System architecture and design philosophy
- [03\\_clocks\\_and\\_reset.md](#) - Clocking and reset strategy
- [04\\_acronyms.md](#) - Acronyms and terminology
- [05\\_references.md](#) - References and related documents

## **Chapter 2: Internal Block Architecture**

**Location:** ch02\_blocks/

This chapter documents the internal functional blocks that make up the AXI4 bridge crossbar.

- [01\\_master\\_adapter.md](#) - Master interface adaptation and ID extension
- [02\\_slave\\_router.md](#) - Address decoding and slave selection
- [03\\_crossbar\\_core.md](#) - Core switching fabric architecture
- [04\\_arbitration.md](#) - Multi-master arbitration schemes
- [05\\_id\\_management.md](#) - Transaction ID tracking (CAM/FIFO)
- [06\\_width\\_conversion.md](#) - Data width conversion logic
- [07\\_protocol\\_conversion.md](#) - AXI4-to-APB protocol conversion
- [08\\_response\\_routing.md](#) - Response path routing and merging
- [09\\_error\\_handling.md](#) - Error detection and handling

## **Chapter 3: AXI4 Interface Reference**

**Location:** ch03\_interfaces/

Complete reference for all AXI4 signals, timing, and protocol rules used in the bridge.

- [01\\_axi4\\_signals.md](#) - Write Address Channel (AW) detailed reference
- [02\\_write\\_data.md](#) - Write Data Channel (W) specification
- [03\\_write\\_response.md](#) - Write Response Channel (B) specification
- [04\\_read\\_address.md](#) - Read Address Channel (AR) detailed reference
- [05\\_read\\_data.md](#) - Read Data and Response Channel (R) specification

## **Chapter 4: CSV Generator**

**Location:** ch02\_csv\_generator/

- [04\\_channel\\_specific.md](#) - Channel-specific masters (wr/rd/rw)

## **Chapter 5: Generated RTL**

**Location:** ch03\_generated\_rtl/

- [01\\_module\\_structure.md](#) - Generated module organization
- [02\\_arbiter\\_fsms.md](#) - AW/AR arbiter finite state machines

- [07\\_bridge\\_id\\_tracking.md](#) - Bridge ID tracking for multi-master response routing

**PlantUML Diagrams:** assets/puml/ - [aw\\_arbiter\\_fsm.puml](#) - AW channel arbiter FSM - [ar\\_arbiter\\_fsm.puml](#) - AR channel arbiter FSM

**Graphviz Block Diagrams:** assets/graphviz/ - [bridge\\_2x2.gv](#) - 2 master × 2 slave configuration - [bridge\\_1x4.gv](#) - 1 master × 4 slave configuration

## Chapter 6: Usage Examples

**Location:** ch04\_usage\_examples/

(To be documented - examples of bridge configuration and usage)

---

## Quick Navigation

### For RTL Designers

- Start with Quick Start Guide
- Review Overview
- Study Internal Blocks
- Reference AXI4 Signals

### For System Architects

- Start with Architecture Overview
- Review Crossbar Core
- Study Arbitration
- Review ID Management

### For Verification Engineers

- Start with Internal Blocks
- Review Error Handling
- Study AXI4 Protocol
- Review Generated RTL

### For Protocol Reference

- Complete AXI4 Signal Reference
  - Understand Channel Interactions
  - Study Out-of-Order Behavior
-

## Key Features Documented

### Phase 1 Features (Complete)

- CSV configuration parsing
- Custom signal prefixes per port
- Mixed AXI4/APB protocol support
- Mixed data widths
- Partial connectivity matrices
- Automatic converter identification
- Port generation with custom prefixes

### Phase 2 Features (Complete)

- **Channel-Specific Masters** - Write-only (wr), read-only (rd), full (rw)
- Conditional AXI4 channel generation
- Optimized width converter instantiation
- Channel-aware direct connections
- Resource optimization for dedicated masters

### Phase 2+ Features (Complete - 2025-11-10)

- **Bridge ID Tracking** - Multi-master response routing with bridge\_id
- Per-master unique BRIDGE\_ID parameter
- Slave adapter CAM/FIFO transaction tracking
- Configurable out-of-order support per slave (enable\_ooo)
- Crossbar bridge\_id-based response routing
- Zero-latency FIFO mode for in-order slaves
- 1-cycle CAM mode for out-of-order slaves

### Future Enhancements (Planned)

- **Monitored Interface Versions:** AXI4/AXIL4 master/slave monitor ports for debugging
  - axi4\_master\_rd\_mon.sv / axi4\_master\_wr\_mon.sv
  - axi4\_slave\_rd\_mon.sv / axi4\_slave\_wr\_mon.sv
  - axil4\_master\_rd\_mon.sv / axil4\_master\_wr\_mon.sv
  - axil4\_slave\_rd\_mon.sv / axil4\_slave\_wr\_mon.sv
  - monbus aggregation (similar to stream component monbus\_axil\_group.sv)
- APB converter placeholder implementation
- Slave-side width converter downsize

- Variable-width crossbar internal data path
  - Performance monitoring integration
- 

## Document Conventions

### Notation

- **bold** - Important terms, file names
- `code` - CSV fields, signal names, commands
- *italic* - Emphasis, notes

### Signal Naming

- `{prefix}awaddr` - AXI4 Write Address channel address
- `{prefix}araddr` - AXI4 Read Address channel address
- `{prefix}paddr` - APB address
- `aclk` - AXI4 clock
- `aresetn` - AXI4 active-low reset
- `pclk` - APB clock
- `presetn` - APB active-low reset

### CSV Notation

- `port_name` - Unique identifier for port
  - `channels` - Channel specification: rw/wr/rd
  - N/A - Not applicable field value
- 

## Version History

| Version | Date       | Changes                                                                            |
|---------|------------|------------------------------------------------------------------------------------|
| 1.0     | 2025-10-19 | Initial specification with Phase 1 features                                        |
| 1.1     | 2025-10-26 | Added channel-specific master support (Phase 2)                                    |
| 1.2     | 2025-11-10 | Added bridge ID tracking for multi-master routing (Phase 2+)                       |
| 0.90    | 2025-11-14 | Added Chapter 2 (Internal Blocks) and Chapter 3 (AXI4 Interface Reference) - Draft |

| Version | Date | Changes    |
|---------|------|------------|
|         |      | for Review |

## Related Documentation

- [..../PRD.md](#) - Product Requirements Document
- [..../CLAUDE.md](#) - AI integration guide
- [..../CSV\\_BRIDGE\\_STATUS.md](#) - Implementation status and phase tracking
- [..../TASKS.md](#) - Development tasks
- [..../bin/bridge\\_csv\\_generator.py](#) - Generator source code

---

**Next:** [Chapter 0 - Quick Start](#) or [Chapter 1 - Overview](#)

## Quick Start - Installation

### *Prerequisites*

Before using the Bridge generator, ensure you have the following tools installed:

**Required Tools:** - **Python 3.8+** - Generator runtime - **Verilator 4.0+** - RTL simulation and linting - **CocoTB** - Python-based verification framework - **GNU Make** - Build automation

**Optional Tools:** - **Vivado/Quartus** - FPGA synthesis (if targeting FPGAs) - **WaveDrom CLI** - Waveform diagram generation - **GraphViz** - Block diagram generation

---

### *Python Environment Setup*

#### 1. Create Virtual Environment (Recommended)

```
cd ~/github/rtldesignsherpa
python3 -m venv venv
source venv/bin/activate
```

#### 2. Install Python Dependencies

```
pip install -r requirements.txt
```

**Key Python Packages:** - cocotb - Verification framework - pytest - Test runner - Jinja2 - Template engine (used by generator) - toml - TOML configuration parser

## *Verilator Installation*

## **Ubuntu/Debian:**

```
sudo apt-get update  
sudo apt-get install verilator
```

## **From Source (for latest version):**

```
git clone https://github.com/verilator/verilator
cd verilator
autoconf
./configure
make -j$(nproc)
sudo make install
```

## Verify Installation:

```
verilator --version  
# Should show: Verilator 4.0 or higher
```

## *CocoTB Installation*

## Verify Generator Installation:

```
cd ~/github/rtldesignsherpa  
python projects/components/bridge/bin/bridge_generator.py --help
```

## **Expected Output:**

---

## *Directory Structure Overview*

```
rtldesignsherpa/
└── projects/components/bridge/
    ├── bin/
    │   ├── bridge_generator.py      ← Main generator script
    │   └── bridge_pkg/
    │       └── test_configs/      ← Example configurations
    ├── rtl/                      ← Generated RTL output
    ├── dv/tests/                 ← Test suite
    └── docs/bridge_spec/         ← This documentation
```

---

## *Quick Verification*

**Run a quick test to verify everything is working:**

```
cd projects/components/bridge/dv/tests
pytest test_bridge_2x2_rw.py::test_basic_read -v
```

**Expected Result:** - Test compiles RTL - Simulation runs - Test PASSES ✓

**If the test fails:** - Check Verilator installation: verilator --version - Check Python packages: pip list | grep cocotb - Check you're in virtual environment: which python

---

## *Troubleshooting*

**Problem: ModuleNotFoundError: No module named 'cocotb'**

**Solution:**

```
pip install cocotb pytest-xdist
```

**Problem: verilator: command not found**

**Solution:**

```
sudo apt-get install verilator
# or compile from source (see above)
```

**Problem: Permission denied when running generator**

**Solution:**

```
chmod +x projects/components/bridge/bin/bridge_generator.py
```

## **Problem: Tests hang or timeout**

**Solution:** - Don't use parallel execution with waves: `pytest test_name.py` (not `-n 48`) - See Chapter 5: Verification for safe test execution

---

## *Next Steps*

Once installation is complete: - [02\\_first\\_bridge.md](#) - Generate your first bridge - [03\\_simulation.md](#) - Run simulations - [04\\_integration.md](#) - Integrate into your design

---

## **Installation Complete! ✓**

You're now ready to generate your first AXI4 crossbar bridge.

## **Quick Start - Your First Bridge**

### *Overview*

In this tutorial, you'll generate a simple **2×2 AXI4 crossbar bridge** (2 masters, 2 slaves) in under 5 minutes.

**What You'll Create:** - 2 master interfaces (CPU, DMA) - 2 slave interfaces (DDR, SRAM) - Full AXI4 protocol support - Address-based routing - Generated in ~1000 lines of SystemVerilog

---

### *Step 1: Create Configuration Directory*

```
cd ~/github/rtldesignsherpa/projects/components/bridge  
mkdir -p my_bridges  
cd my_bridges
```

---

### *Step 2: Create TOML Configuration*

Create a file named `my_first_bridge.toml`:

```
bridge_name = "my_first_bridge"
```

```
[[masters]]  
name = "cpu_master"  
prefix = "cpu_m_axi_"
```

```

type = "axi4"
channels = "rw"                      # Read-write (full 5-channel AXI4)
data_width = 64
addr_width = 32
id_width = 4

[[masters]]
name = "dma_master"
prefix = "dma_m_axi_"
type = "axi4"
channels = "rw"
data_width = 64
addr_width = 32
id_width = 4

[[slaves]]
name = "ddr_slave"
prefix = "ddr_s_axi_"
type = "axi4"
channels = "rw"
data_width = 64
addr_width = 32
id_width = 4
base_addr = 0x00000000
size = 0x40000000               # 1GB

[[slaves]]
name = "sram_slave"
prefix = "sram_s_axi_"
type = "axi4"
channels = "rw"
data_width = 64
addr_width = 32
id_width = 4
base_addr = 0x40000000
size = 0x10000000               # 256MB

[[connectivity]]
master = "cpu_master"
slaves = ["ddr_slave", "sram_slave"]

[[connectivity]]
master = "dma_master"
slaves = ["ddr_slave", "sram_slave"]

```

---

### *Step 3: Create Connectivity CSV*

Create a file named `my_first_bridge_connectivity.csv`:

```
,ddr_slave,sram_slave  
cpu_master,1,1  
dma_master,1,1
```

**What This Means:** - 1 = connection allowed - 0 = connection blocked (not used here)  
- Both masters can access both slaves

---

### *Step 4: Generate the Bridge*

**Run the generator:**

```
cd ~/github/rtl_design_sherpa/projects/components/bridge
```

```
python bin/bridge_generator.py \  
  --config my_bridges/my_first_bridge.toml \  
  --connectivity my_bridges/my_first_bridge_connectivity.csv \  
  --output rtl/generated/ \  
  --verbose
```

**Expected Output:**

```
Bridge Generator v2.1
```

```
=====
```

```
Reading configuration: my_bridges/my_first_bridge.toml  
Reading connectivity: my_bridges/my_first_bridge_connectivity.csv  
Validating configuration...  
✓ 2 masters, 2 slaves  
✓ Address ranges validated  
✓ Connectivity matrix validated
```

```
Generating package:
```

```
rtl/generated/my_first_bridge/my_first_bridge_pkg.sv
```

```
Generating adapters:
```

```
✓ cpu_master_adapter.sv  
✓ dma_master_adapter.sv
```

```
Generating crossbar: my_first_bridge_xbar.sv
```

```
Generating top-level: my_first_bridge.sv
```

```
Generating filelist: my_first_bridge.f
```

```
Generation complete! ✓
```

```
Output: rtl/generated/my_first_bridge/
```

---

### *Step 5: Examine Generated Files*

```
cd rtl/generated/my_first_bridge  
ls -la
```

### **Generated Files:**

```
my_first_bridge/  
└── my_first_bridge.sv          # Top-level bridge module  
└── my_first_bridge_pkg.sv      # Package with types  
└── my_first_bridge_xbar.sv     # Crossbar routing logic  
└── cpu_master_adapter.sv       # CPU adapter  
└── dma_master_adapter.sv       # DMA adapter  
└── my_first_bridge.f           # Filelist for compilation
```

### **Quick Look at Top-Level:**

```
module my_first_bridge (  
    input  logic          aclk,  
    input  logic          aresetn,  
  
    // CPU Master Interface (64-bit)  
    input  logic [3:0]    cpu_m_axi_awid,  
    input  logic [31:0]   cpu_m_axi_awaddr,  
    // ... all AXI4 signals  
  
    // DMA Master Interface (64-bit)  
    input  logic [3:0]    dma_m_axi_awid,  
    // ...  
  
    // DDR Slave Interface (64-bit)  
    output logic [3:0]   ddr_s_axi_awid,  
    output logic [31:0]  ddr_s_axi_awaddr,  
    // ...  
  
    // SRAM Slave Interface (64-bit)  
    output logic [3:0]   sram_s_axi_awid,  
    // ...  
);
```

---

### *Step 6: Quick Lint Check*

### **Verify the generated RTL with Verilator:**

```
cd ~/github/rtl_design_sherpa/projects/components/bridge  
  
verilator --lint-only \  
-f rtl/generated/my_first_bridge/my_first_bridge.f \  
rtl/generated/my_first_bridge/my_first_bridge.sv
```

## Expected Output:

```
%Info: my_first_bridge.sv: No lint warnings
```

**If you see warnings:** - Most warnings are informational - Check for actual errors (not warnings) - See Chapter 5: Verification → Troubleshooting

---

## *Understanding the Generated Bridge*

### Architecture:



**Key Features:** - **Adapters:** Handle width conversion, address decode, timing isolation - **Crossbar:** Routes transactions based on address ranges - **Arbitration:** Fair round-robin when both masters target same slave - **ID Tracking:** Supports out-of-order completion via transaction IDs

---

## *Next-Level: Try Different Configurations*

### Example 2: Mixed Data Widths

```
# masters/cpu has 32-bit data
data_width = 32
```

```
# slaves/ddr has 64-bit data
data_width = 64
```

*Generator automatically inserts width converters!*

### Example 3: Channel-Specific Masters

```
# Read-only master (saves 39% ports)
channels = "rd"
```

```
# Write-only master (saves 61% ports)
channels = "wr"
```

*Optimizes for dedicated memory transfer engines!*

### Example 4: APB Slaves

```
[[slaves]]
type = "apb"          # APB slave instead of AXI4
# Generator inserts AXI4-to-APB protocol converter
```

---

## Common Beginner Mistakes

### ✗ Mistake 1: Overlapping Address Ranges

```
base_addr = 0x00000000, size = 0x80000000 # Slave 0
base_addr = 0x40000000, size = 0x80000000 # Slave 1 (OVERLAP!)
```

✓ **Solution:** Generator validates and reports error

### ✗ Mistake 2: Disconnected Masters

```
,ddr_slave,sram_slave
cpu_master,0,0          # CPU can't access anything!
```

✓ **Solution:** At least one connection per master

### ✗ Mistake 3: Mismatched ID Widths

```
masters.id_width = 4    # 4-bit IDs
slaves.id_width = 2     # 2-bit IDs (MISMATCH!)
```

✓ **Solution:** Keep id\_width consistent (or generator extends)

---

## Troubleshooting

**Problem: FileNotFoundError:** `my_first_bridge.toml` - Check file path and name - Ensure you're in the correct directory

**Problem: Address overlap detected** - Check base\_addr and size for each slave - Use non-overlapping ranges

**Problem: Verilator: syntax error** - Check TOML syntax (proper quotes, brackets) - Validate with: `python -c "import toml; toml.load('my_first_bridge.toml')"`

---

## *Next Steps*

Your bridge is generated! Now what?

- [03\\_simulation.md](#) - Test your bridge with CocoTB
  - [04\\_integration.md](#) - Integrate into your SoC
- 

## Congratulations! 🎉

You've generated your first AXI4 crossbar bridge. The entire process took under 5 minutes, and you have production-ready RTL.

**Key Takeaways:** - TOML configuration is human-readable - Generator handles complex logic automatically - Lint-clean RTL generated - Ready for simulation and synthesis

## Quick Start - Simulation

### *Overview*

Now that you've generated your first bridge, let's test it with **CocoTB simulations** to verify correct operation.

**What You'll Do:** - Run basic read/write tests - Verify address decoding - Check arbitration - View waveforms (optional)

**Time Required:** 10-15 minutes

---

### *Pre-Generated Test Suite*

The Bridge component includes a comprehensive test suite in `dv/tests/`. For your `my_first_bridge`, there may not be a pre-generated test, so we'll use an existing test as reference or create a simple one.

#### **Available Pre-Generated Tests:**

```
cd ~/github/rtl_designersherpa/projects/components/bridge/dv/tests  
ls test_bridge_*.py
```

#### **Output:**

```
test_bridge_1x2_rd.py  
test_bridge_1x2_wr.py
```

```
test_bridge_2x2_rw.py      ← Similar to our bridge!
test_bridge_5x3_channels.py
```

---

### *Quick Test: Use Existing 2x2 Bridge*

**Let's test the pre-generated `bridge_2x2_rw` which is similar to your bridge:**

```
cd ~/github/rtldesignsherpa/projects/components/bridge/dv/tests
```

```
# Run basic read test (sequential, no waves)
pytest test_bridge_2x2_rw.py::test_basic_read -v
```

**Expected Output:**

```
===== test session starts =====
platform linux -- Python 3.x
collected 1 item

test_bridge_2x2_rw.py::test_basic_read PASSED [100%]

===== 1 passed in 15.23s =====
```

**What Just Happened:** 1. Verilator compiled the bridge RTL (~1GB model) 2. CocoTB ran simulation 3. Test sent read transaction from master 4. Verified correct data returned 5. Test PASSED ✓

---

### *Understanding Test Output*

**During test execution, you'll see:**

```
0.00ns INFO    cocotb.gpi
gpi_embed.cpp:76  in set_program_name_in_venv      Did not detect
Python virtual environment
...
100.00ns INFO    cocotb.bridge_2x2_rw_tb          Starting
test_basic_read
150.00ns INFO    cocotb.bridge_2x2_rw_tb          Master 0
initiating read to 0x00001000
200.00ns INFO    cocotb.bridge_2x2_rw_tb          Address
decoded to slave 0 (DDR)
250.00ns INFO    cocotb.bridge_2x2_rw_tb          Transaction
complete - data: 0xDEADBEEF
...
```

**Key Events:** - **Address decode:** Which slave was selected - **Transaction timing:** How many cycles - **Data verification:** Expected vs actual

---

### *Run More Tests*

#### **Basic Test Suite:**

```
# Test basic write  
pytest test_bridge_2x2_rw.py::test_basic_write -v  
  
# Test address decode (verifies correct slave selection)  
pytest test_bridge_2x2_rw.py::test_address_decode -v  
  
# Test arbitration (both masters, same slave)  
pytest test_bridge_2x2_rw.py::test_arbitration -v
```

#### **Run All Tests for One Bridge:**

```
pytest test_bridge_2x2_rw.py -v
```

**Expected Results:** - All basic tests should PASS - Total time: 2-5 minutes depending on test count

---

### *CRITICAL: Safe Test Execution*

#### **✗ DANGEROUS - Can Crash Your System:**

```
# DON'T DO THIS!  
pytest test_bridge_2x2_rw.py -n 48 --waves
```

**Why?** - Each test compiles ~1GB Verilator model - -n 48 spawns 48 parallel workers -  $48 \times 1\text{GB} = 48\text{GB+}$  memory usage - Can cause system reboot even with 256GB RAM!

#### **✓ SAFE - Sequential Execution:**

```
# Run tests one at a time  
pytest test_bridge_2x2_rw.py -v  
  
# Or run single test  
pytest test_bridge_2x2_rw.py::test_name -v
```

**See Chapter 5: Verification** for detailed test execution safety guidelines.

---

## *Viewing Waveforms (Optional)*

### **Generate VCD waveform file:**

```
pytest test_bridge_2x2_rw.py::test_basic_read -v --waves
```

**This creates:** - dv/tests/test\_bridge\_2x2\_rw/sim\_build/dump.vcd - Large file (~100MB-1GB depending on test duration)

### **View with GTKWave:**

```
gtkwave dv/tests/test_bridge_2x2_rw/sim_build/dump.vcd
```

**Key Signals to Observe:** - cpu\_m\_axi\_ar\* - Master read address channel - ddr\_s\_axi\_ar\* - Slave read address channel - \*\_arvalid, \*\_arready - Handshake signals - bridge\_2x2\_rw.u\_cpu\_master\_adapter.slave\_select\_ar - Address decode

**Waveform demonstrates:** - Address decode logic - Crossbar routing - Arbitration decisions - Data flow through bridge

---

## *Creating a Custom Test (Advanced)*

If you want to test your my\_first\_bridge specifically:

### **1. Generate Test Template:**

```
cd ~/github/rtl_designersherpa/projects/components/bridge/bin  
  
python bridge_generator.py \  
  --config ..../my_bridges/my_first_bridge.toml \  
  --connectivity ..../my_bridges/my_first_bridge_connectivity.csv \  
  --output ..../rtl/generated/ \  
  --generate-tests
```

**This creates:** - dv/tbclasses/my\_first\_bridge\_tb.py - Testbench class - dv/tests/test\_my\_first\_bridge.py - Test file

### **2. Run Your Custom Test:**

```
cd ~/github/rtl_designersherpa/projects/components/bridge/dv/tests  
pytest test_my_first_bridge.py::test_basic_read -v
```

---

## *Interpreting Test Results*

### ✓ Test PASSED:

```
test_bridge_2x2_rw.py::test_basic_read PASSED
```

- Bridge correctly routed transactions
- Address decode worked
- Data integrity verified
- Ready for next step!

### ✗ Test FAILED:

```
test_bridge_2x2_rw.py::test_basic_read FAILED
FAILED test_bridge_2x2_rw.py::test_basic_read - AssertionError:
Expected 0xDEADBEEF, got 0x00000000
```

**Common Failure Causes:** 1. **Address decode error** - Check base\_addr/size in TOML  
2. **Width mismatch** - Verify data\_width consistency  
3. **Timing issue** - Check for setup/hold violations in waveform  
4. **Generator bug** - Report to maintainers

---

## *Performance Verification*

### Check transaction latency:

```
pytest test_bridge_2x2_rw.py::test_latency -v
```

**Expected Results:** - **Single-beat read:** 2-3 cycles - **Single-beat write:** 2-3 cycles - **Burst transfer:** ~1 cycle/beat after address phase

**Meets Specification:** - Latency  $\leq$  3 cycles for single transactions - Burst efficiency > 95%

---

## *Troubleshooting Simulation Issues*

**Problem:** `ModuleNotFoundError: No module named 'cocotb'`

**Solution:**

```
pip install cocotb pytest-xdist
```

**Problem:** `Verilator: Can't find file: bridge_2x2_rw.sv`

**Solution:**

```
# Regenerate the bridge
cd ~/github/rtldesignsherpa/projects/components/bridge
make -C dv/tests test # Regenerates all bridges
```

**Problem: Test timeout after 120 seconds**

**Solution:** - Check you're not running -n 48 parallel - Increase timeout: pytest --timeout=300 - See Chapter 5: Known Issues

**Problem: Permission denied: dump.vcd****Solution:**

```
# Clean old simulation artifacts
cd dv/tests
make clean-all
# Re-run test
pytest test_bridge_2x2_rw.py::test_basic_read --waves
```

---

*Test Suite Summary*

The Bridge test suite includes:

| Test Category         | Tests | Purpose                        |
|-----------------------|-------|--------------------------------|
| <b>Basic</b>          | 4-5   | Read, write,<br>enable/disable |
| <b>Address Decode</b> | 3-4   | Correct slave<br>selection     |
| <b>Arbitration</b>    | 2-3   | Fair access, no<br>starvation  |
| <b>Burst</b>          | 2-3   | Multiple-beat<br>transactions  |
| <b>Edge Cases</b>     | 3-4   | Back-to-back,<br>overlapping   |
| <b>Stress</b>         | 1-2   | All masters, high<br>traffic   |

**Total per bridge configuration: ~15-20 tests**

---

## *Next Steps*

### **Simulations passing? ✓**

You're ready to integrate!

- [\*\*04\\_integration.md\*\*](#) - Integrate bridge into your SoC design

**Additional Resources:** - **Chapter 2: Blocks** - Understand internal architecture - **Chapter 5: Verification** - Full test suite documentation - **Chapter 7: Troubleshooting** - Debug failed tests

---

### **Simulation Complete! 🎉**

Your bridge has been verified in simulation. The generated RTL is functionally correct and ready for integration or synthesis.

**Key Takeaways:** - CocoTB provides Python-based verification - Tests verify address decode, arbitration, data integrity - Sequential execution is CRITICAL (no -n 48 with waves!) - Waveforms help debug issues - ~15-20 tests per bridge configuration

## [\*\*Quick Start - Integration\*\*](#)

### [\*Overview\*](#)

Now that your bridge is generated and tested, let's integrate it into a SystemVerilog design.

**What You'll Learn:** - How to instantiate the bridge in your RTL - Connect AXI4 masters and slaves - Clock and reset connections - File list integration

**Time Required:** 10-15 minutes

---

### [\*Prerequisites\*](#)

Before integration: - ✓ Bridge generated (see [02\\_first\\_bridge.md](#)) - ✓ Tests passing (see [03\\_simulation.md](#)) - ✓ Understanding of your SoC architecture - ✓ AXI4 master/slave components ready

---

## *Integration Steps Overview*

1. Add bridge files to your project
  2. Instantiate bridge in your top-level
  3. Connect masters and slaves
  4. Connect clocks and resets
  5. Compile and verify
- 

### *Step 1: Add Bridge Files to Project*

#### **Option A: Use Generated Filelist**

Your generated bridge includes a .f filelist:

```
rtl/generated/my_first_bridge/my_first_bridge.f
```

#### **Contents:**

```
// Bridge package
rtl/generated/my_first_bridge/my_first_bridge_pkg.sv

// Bridge components
rtl/generated/my_first_bridge/cpu_master_adapter.sv
rtl/generated/my_first_bridge/dma_master_adapter.sv
rtl/generated/my_first_bridge/my_first_bridge_xbar.sv
rtl/generated/my_first_bridge/my_first_bridge.sv

// Dependencies (if any)
rtl/amba/axi4_slave_wr.sv      // Timing isolation wrappers
rtl/amba/axi4_slave_rd.sv      // Skid buffers
rtl/common/skid_buffer.sv
// ... other dependencies
```

#### **Add to your compilation:**

```
# Verilator
verilator -f rtl/generated/my_first_bridge/my_first_bridge.f
your_top.sv

# Vivado
read_verilog -f rtl/generated/my_first_bridge/my_first_bridge.f

# VCS
vcs -f rtl/generated/my_first_bridge/my_first_bridge.f
```

#### **Option B: Manual File Addition**

If you prefer manual control, add individual files to your project's compile order.

---

## Step 2: Instantiate Bridge in Your Design

### Example: SoC Top-Level

```
module my_soc_top (
    input logic          clk,
    input logic          rst_n,
    // ... other ports
);

// =====
// Clock and Reset
// =====
logic aclk;
logic aresetn;

assign aclk = clk;
assign aresetn = rst_n;

// =====
// AXI4 Signal Declarations
// =====

// CPU Master Signals (connects to CPU core)
logic [3:0]  cpu_m_axi_awid;
logic [31:0] cpu_m_axi_awaddr;
logic [7:0]  cpu_m_axi_awlen;
logic [2:0]  cpu_m_axi_awsize;
logic [1:0]  cpu_m_axi_awburst;
logic        cpu_m_axi_awlock;
logic [3:0]  cpu_m_axi_awcache;
logic [2:0]  cpu_m_axi_awprot;
logic        cpu_m_axi_awvalid;
logic        cpu_m_axi_awready;
// ... (W, B, AR, R channels)

// DMA Master Signals (connects to DMA engine)
logic [3:0]  dma_m_axi_awid;
// ... (all AXI4 channels)

// DDR Slave Signals (connects to DDR controller)
logic [3:0]  ddr_s_axi_awid;
logic [31:0] ddr_s_axi_awaddr;
// ... (all AXI4 channels)

// SRAM Slave Signals (connects to SRAM controller)
logic [3:0]  sram_s_axi_awid;
// ... (all AXI4 channels)
```

```

// =====
// Master Components
// =====

// CPU Core (generates AXI4 master transactions)
cpu_core u_cpu (
    .clk          (aclk),
    .rst_n        (aresetn),
    // AXI4 Master Interface
    .m_axi_awid   (cpu_m_axi_awid),
    .m_axi_awaddr  (cpu_m_axi_awaddr),
    // ... connect all AXI4 signals
);

// DMA Engine (generates AXI4 master transactions)
dma_engine u_dma (
    .clk          (aclk),
    .rst_n        (aresetn),
    // AXI4 Master Interface
    .m_axi_awid   (dma_m_axi_awid),
    // ... connect all AXI4 signals
);

// =====
// BRIDGE INSTANTIATION
// =====

my_first_bridge u_bridge (
    .aclk          (aclk),
    .aresetn      (aresetn),

    // CPU Master Interface (inputs to bridge)
    .cpu_m_axi_awid  (cpu_m_axi_awid),
    .cpu_m_axi_awaddr (cpu_m_axi_awaddr),
    .cpu_m_axi_awlen  (cpu_m_axi_awlen),
    .cpu_m_axi_awsize (cpu_m_axi_awsize),
    .cpu_m_axi_awburst (cpu_m_axi_awburst),
    .cpu_m_axi_awlock  (cpu_m_axi_awlock),
    .cpu_m_axi_awcache (cpu_m_axi_awcache),
    .cpu_m_axi_awprot  (cpu_m_axi_awprot),
    .cpu_m_axi_awvalid (cpu_m_axi_awvalid),
    .cpu_m_axi_awready (cpu_m_axi_awready),
    // ... (W, B, AR, R channels)

    // DMA Master Interface (inputs to bridge)
    .dma_m_axi_awid  (dma_m_axi_awid),
    // ... (all DMA AXI4 signals)

    // DDR Slave Interface (outputs from bridge)

```

```

.ddr_s_axi_awid      (ddr_s_axi_awid),
.ddr_s_axi_awaddr    (ddr_s_axi_awaddr),
// ... (all DDR AXI4 signals)

// SRAM Slave Interface (outputs from bridge)
.sram_s_axi_awid    (sram_s_axi_awid),
// ... (all SRAM AXI4 signals)
);

// =====
// Slave Components
// =====

// DDR Controller (receives AXI4 slave transactions)
ddr_controller u_ddr (
    .clk          (aclk),
    .rst_n        (aresetn),
    // AXI4 Slave Interface
    .s_axi_awid   (ddr_s_axi_awid),
    .s_axi_awaddr (ddr_s_axi_awaddr),
    // ... connect all AXI4 signals
);

// SRAM Controller (receives AXI4 slave transactions)
sram_controller u_sram (
    .clk          (aclk),
    .rst_n        (aresetn),
    // AXI4 Slave Interface
    .s_axi_awid   (sram_s_axi_awid),
    // ... connect all AXI4 signals
);

endmodule

```

---

### Step 3: Signal Connection Checklist

#### Master Interface Connections (Bridge Inputs):

| Signal Type | Count       | Direction | Notes                  |
|-------------|-------------|-----------|------------------------|
| AW Channel  | ~12 signals | Input     | Write address channel  |
| W Channel   | ~4 signals  | Input     | Write data channel     |
| B Channel   | ~3 signals  | Output    | Write response channel |

| Signal Type | Count       | Direction | Notes                |
|-------------|-------------|-----------|----------------------|
| AR Channel  | ~12 signals | Input     | Read address channel |
| R Channel   | ~5 signals  | Output    | Read data channel    |

### Slave Interface Connections (Bridge Outputs):

| Signal Type | Count       | Direction | Notes                  |
|-------------|-------------|-----------|------------------------|
| AW Channel  | ~12 signals | Output    | Write address channel  |
| W Channel   | ~4 signals  | Output    | Write data channel     |
| B Channel   | ~3 signals  | Input     | Write response channel |
| AR Channel  | ~12 signals | Output    | Read address channel   |
| R Channel   | ~5 signals  | Input     | Read data channel      |

**Total Signals (2 masters × 2 slaves):** - ~200-250 signal connections - Generator creates consistent naming for easy connection

---

### Step 4: Clock and Reset Strategy

#### Single Clock Domain (Simplest):

```
// All components share same clock
assign aclk = system_clk;
assign aresetn = system_rst_n;

my_first_bridge u_bridge (
    .aclk    (aclk),
    .aresetn (aresetn),
    // ...
);
```

#### Multiple Clock Domains (CDC Required):

```
// Masters on fast clock
assign master_aclk = cpu_clk;      // 400 MHz
```

```

// Slaves on slow clock
assign slave_aclk = mem_clk;           // 200 MHz

// Bridge on fast clock with CDC at outputs
my_first_bridge u_bridge (
    .aclk      (master_aclk),
    .aresetn   (master_aresetn),
    // ...
);

// CDC FIFOs between bridge and slaves
axi_cdc_fifo u_ddr_cdc (
    .s_aclk      (master_aclk),
    .s_aresetn   (master_aresetn),
    .s_axi_*     (ddr_s_axi_*),      // From bridge

    .m_aclk      (slave_aclk),
    .m_aresetn   (slave_aresetn),
    .m_axi_*     (ddr_cdc_s_axi_*), // To DDR controller
);

```

**Reset Sequence:** 1. Assert aresetn = 0 for at least 10 clock cycles 2. De-assert aresetn = 1 on clock edge 3. Bridge clears all internal state 4. Ready for transactions

---

### Step 5: Address Map Verification

**Verify your configuration matches your SoC address map:**

```

// In your TOML configuration:
[[slaves]]
name = "ddr_slave"
base_addr = 0x00000000      # Must match DDR base address!
size = 0x40000000          # Must match DDR size!

[[slaves]]
name = "sram_slave"
base_addr = 0x40000000      # Must match SRAM base address!
size = 0x10000000          # Must match SRAM size!

```

### Address Decode Logic:

CPU writes to 0x00001000:  
→ Address decoder: 0x00001000 in [0x00000000, 0x40000000]?  
→ YES → Route to DDR slave  
→ Transaction reaches DDR controller

```
CPU writes to 0x40000100:  
→ Address decoder: 0x40000100 in [0x40000000, 0x50000000)?  
→ YES → Route to SRAM slave  
→ Transaction reaches SRAM controller  
  
CPU writes to 0x80000000:  
→ Address decoder: No match found  
→ Bridge returns DECERR (decode error)  
→ Transaction fails gracefully
```

---

### *Step 6: Compile and Verify*

#### **Synthesis Check (Vivado Example):**

```
# Add bridge files  
read_verilog -f rtl/generated/my_first_bridge/my_first_bridge.f  
  
# Add your design  
read_verilog my_soc_top.sv  
  
# Synthesize  
synth_design -top my_soc_top -part xcvu9p-flga2104-2-i  
  
# Check for errors  
report_timing_summary  
report_utilization
```

**Expected Results:** - No syntax errors - No unconnected ports - Timing clean (or with known violations) - LUT/FF usage as expected (~2K-5K LUTs for 2x2 bridge)

---

### *Common Integration Issues*

#### **Issue 1: Port Width Mismatch**

Error: Port 'cpu\_m\_axi\_awaddr' is 32 bits, but signal 'cpu\_awaddr' is 64 bits

**Solution:** - Check TOML addr\_width matches your component's address width - Regenerate bridge with correct widths

#### **Issue 2: Unconnected Signals**

Warning: Signal 'cpu\_m\_axi\_awqos' is unconnected

**Solution:** - AXI4 QoS signals may be optional - Tie to appropriate values if unused: assign cpu\_m\_axi\_awqos = 4'b0;

### Issue 3: Timing Violation

Critical Path: cpu\_master\_adapter → crossbar (failed by 0.5 ns)

**Solution:** - See Chapter 6: Performance for pipeline options - Consider adding FIFOs at adapter outputs - Reduce clock frequency temporarily

---

### *Integration Checklist*

Before releasing your design:

- All bridge files added to compilation
  - Bridge instantiated in top-level
  - All masters connected (inputs)
  - All slaves connected (outputs)
  - Clock connected to aclk
  - Reset connected to aresetn
  - Address map verified against TOML
  - No unconnected ports
  - Synthesis clean (no errors)
  - Timing analyzed (no critical violations)
  - Simulation clean (if available)
- 

### *Example Projects*

#### Reference Implementations:

```
projects/components/bridge/examples/  
└── simple_2x2/          # Basic 2 master, 2 slave  
└── cpu_with_dma/        # CPU + DMA + memory hierarchy  
└── soc_integration/    # Full SoC with peripherals
```

See Chapter 4: Programming → Integration Examples for complete working examples.

---

## *Next Steps*

### **Integration complete? ✓**

You now have a working SoC with AXI4 crossbar!

**Additional Resources:** - **Chapter 1: Overview** - Understand bridge architecture - **Chapter 2: Blocks** - Detailed micro-architecture - **Chapter 3: Interfaces** - Complete signal specifications - **Chapter 4: Programming** - Advanced configuration - **Chapter 6: Performance** - Optimization and pipelining

---

## *Troubleshooting Resources*

**Having issues?** - **Chapter 5: Verification** - Test execution and debugging - **Appendix A: Generator** - Understanding the generator internals - **Appendix B: Internals** - Advanced development topics

---

## **Integration Complete! 🎉**

Your bridge is now integrated into your SoC design. You've completed the Quick Start guide!

**What You've Accomplished:** - ✓ Installed tools and dependencies - ✓ Generated your first bridge (2x2 configuration) - ✓ Ran simulations and verified functionality - ✓ Integrated bridge into SystemVerilog design - ✓ Ready for synthesis or further development

**Key Takeaways:** - Bridge generator simplifies AXI4 crossbar creation - Generated RTL is production-ready - Integration is straightforward with consistent signal naming - Address map configuration is critical - CocoTB provides comprehensive verification

---

## **Welcome to the Bridge Component!**

You're now ready to create complex SoC interconnects with confidence. Explore the full specification for advanced features: - Mixed data widths - Channel-specific masters (rd/wr optimization) - APB protocol conversion - Deep pipeline options - And much more!

## Bridge - Overview

### *Introduction*

The **Bridge** component is a Python-based AXI4 crossbar generator that produces high-performance, parameterized SystemVerilog RTL for connecting multiple AXI4 masters to multiple AXI4 slaves. Named after the infrastructure that connects different regions, Bridge creates efficient memory-mapped interconnects from human-readable TOML and CSV configuration files.

**Design Philosophy:** Simple, performant, and pragmatic. Bridge focuses on common use cases rather than attempting to support every edge case of the AXI4 specification. Hard limits (8-bit ID width, 64-bit addresses) eliminate unnecessary complexity while supporting all realistic hardware designs.

**Status:** ✓ **Phase 2+ Complete** - Bridge ID tracking fully implemented, all test configurations passing

---

### *Key Features*

**Core Functionality:** - **Configurable Topology:** 1-32 masters × 1-256 slaves - **Full AXI4 Protocol:** All 5 channels (AW, W, B, AR, R) - **Automatic Generation:** ~1000 lines of RTL from 20 lines of TOML - **Address-Based Routing:** Configurable address decode per slave - **Fair Arbitration:** Round-robin when multiple masters target same slave - **ID-Based Tracking:** Out-of-order transaction support via bridge\_id - **Burst Support:** Full AXI4 burst protocol (INCR, WRAP, FIXED)

**Advanced Features:** - **Mixed Data Widths:** Automatic width conversion (32b ↔ 64b ↔ 128b ↔ 256b ↔ 512b) - **Channel-Specific Masters:** Write-only (wr), read-only (rd), or full (rw) - **Mixed Protocols:** AXI4 + APB slave support (protocol converters auto-inserted) - **Bridge ID Tracking:** Per-master unique IDs for multi-master response routing - **Configurable OOO:** Per-slave out-of-order enable with CAM/FIFO tracking - **Python Automation:** Comprehensive generator with Jinja2 templates

**Resource Optimization:** - Direct connections for matching widths (zero overhead) - Converters only where needed (not fixed-width crossbar) - Channel-specific masters reduce port count by 39%-61% - Skid buffers for backpressure handling

---

## *Applications*

**Multi-Core Processors:** - CPU cores + GPU accessing shared memory hierarchy - Cache-coherent interconnect for multi-core systems - High-bandwidth memory access for accelerators - Independent read/write paths prevent head-of-line blocking

**DMA and Streaming:** - DMA engines with dedicated memory paths - Channel-specific masters (wr/rd) for streaming applications - Multiple concurrent transfers without interference - Burst optimization for high-throughput data movement

**SoC Integration:** - Memory-mapped peripheral interconnect - Mixed AXI4 and APB slave attachment - Scalable to 32 masters × 256 slaves - Standard AXI4 interfaces for IP reuse

**FPGA System Integration:** - MicroBlaze/Nios II CPU interconnect - Custom accelerator fabric - Standard Xilinx/Intel IP integration - Performance counters for profiling

---

## *Comparison with Other Crossbars*

| Feature              | APB Crossbar        | AXI-Stream (Delta) | Bridge (AXI4)            |
|----------------------|---------------------|--------------------|--------------------------|
| <b>Protocol</b>      | Simple register bus | Streaming data     | Memory-mapped burst      |
| <b>Channels</b>      | 1                   | 1 (data stream)    | 5 (AW, W, B, AR, R)      |
| <b>Routing</b>       | Address decode      | TDEST decode       | Address decode           |
| <b>Out-of-Order</b>  | No                  | No                 | <b>Yes (ID-based)</b>    |
| <b>Burst Support</b> | No                  | Packet (TLAST)     | <b>Yes (AWLEN/ARLEN)</b> |
| <b>Use Case</b>      | Control registers   | Data streaming     | <b>Memory-mapped I/O</b> |
| <b>Complexity</b>    | Low                 | Medium             | <b>High</b>              |
| <b>Latency</b>       | 1-2 cycles          | 2 cycles           | <b>2-3 cycles</b>        |

**Bridge Sweet Spot:** High-performance memory-mapped interconnects where out-of-order completion and burst efficiency are critical.

---

## *Development Status*

**Phase 1: CSV Configuration ✓ COMPLETE** (2025-10-19) - CSV parser (ports.csv, connectivity.csv) - Custom port prefixes per port - Mixed AXI4/APB protocol support - Automatic converter identification - Partial connectivity matrices

**Phase 2: Channel-Specific Masters ✓ COMPLETE** (2025-10-26) - channels field: rw/wr/rd support - Conditional port generation based on channels - Resource optimization (39%-61% port reduction) - Width converter awareness - Example: 4-master bridge saves 35% signals

**Phase 2+: Bridge ID Tracking ✓ COMPLETE** (2025-11-10) - Per-master unique BRIDGE\_ID parameter - Slave adapter CAM/FIFO transaction tracking - Configurable out-of-order support per slave (enable\_ooo) - Crossbar bridge\_id-based response routing - Zero-latency FIFO mode for in-order slaves - 1-cycle CAM mode for out-of-order slaves

**Phase 3: APB Converter Implementation ⏳ PENDING** - AXI2APB converter module integration - APB signal packing/unpacking logic - End-to-end testing with APB slaves - Status: Placeholders in generated code with TODO comments

**Future Enhancements** (Planned) - Deeper pipeline options (>400 MHz Fmax) - Configurable skid buffer depth (1-8 stages) - Optional FIFO insertion at adapter outputs - Python GUI for TOML configuration (prevent illegal configs) - Variable-width crossbar internal data path - Performance monitoring integration

---

## *Test Coverage*

### All Configurations Passing:

| Config    | Masters | Slaves | Channels | CDC | Status |
|-----------|---------|--------|----------|-----|--------|
| 1x2 Read  | 1       | 2      | rd       | No  | ✓ 100% |
| 1x2 Write | 1       | 2      | wr       | No  | ✓ 100% |
| 2x2 RW    | 2       | 2      | rw       | No  | ✓ 100% |
| 3x5 RW    | 3       | 5      | rw       | No  | ✓ 100% |
| 4x4 RW    | 4       | 4      | rw       | No  | ✓ 100% |
| 5x3       | 5       | 3      | mixed    | No  | ✓ 100% |

| Config | Masters | Slaves | Channels | CDC | Status |
|--------|---------|--------|----------|-----|--------|
| Mixed  |         |        |          |     |        |

**Test Levels:** - **Basic (4-5 tests):** Register access, read/write, enable/disable - **Medium (5-6 tests):** Address decode, arbitration, bursts - **Full (3-4 tests):** Stress tests, edge cases, multi-master

**Known Issues:** - ⚠ Parallel test execution with waves causes system reboot (fixed in conftest.py) - ⚠ APB converter pending (Phase 3 implementation)

---

### *Performance Characteristics*

**Latency (2x2 Bridge, 64-bit Data):** - Single-beat read: **2-3 cycles** - Single-beat write: **2-3 cycles** - Burst transfer: **~1 cycle/beat** after address phase

**Throughput:** - Concurrent transfers: **All M×N paths simultaneously** - Burst efficiency: >95% (minimal protocol overhead) - Direct connections: **Zero-latency** for matching widths

**Resource Usage (Estimated, Post-Synthesis):** - 2x2 bridge (64-bit): ~2,500 LUTs, ~3,000 FFs - 4x4 bridge (64-bit): ~5,000 LUTs, ~6,000 FFs - Scaling: ~150 LUTs per M×S connection

**Timing:** - Target Fmax: **300-400 MHz** on UltraScale+ FPGAs - Critical paths: Arbitration logic, address decode - Pipeline options: Configurable skid buffers, optional FIFOs

---

### *Design Decisions*

**Why Python Generator?** - **Flexibility:** Easy to extend and modify - **Jinja2 Templates:** Clean separation of logic and RTL generation - **Validation:** Config validation before RTL generation - **Automation:** Batch generation, test generation, documentation - **Maintainability:** Single source of truth for all configurations

**Why TOML Configuration?** - **Human-Readable:** More intuitive than JSON or XML - **Type-Safe:** Built-in validation - **Comments:** Self-documenting configurations - **Arrays:** Natural representation of masters/slaves - **CSV Companion:** Connectivity matrix for complex topologies

**Why 64-Bit Internal Path?** - **Common Width:** Most modern systems use 64-bit or wider - **Minimal Conversion:** Most masters/slaves are 32-bit or 64-bit - **Performance:** Wider internal path reduces conversion stages - **Simplicity:** Single internal width simplifies logic

**Why Hard Limits (8-bit ID, 64-bit Addr)?** - **Realism:** 256 outstanding transactions per master is sufficient - **Complexity:** Variable ID width adds considerable logic - **Performance:** Eliminates conversion logic in critical paths - **Pragmatism:** Focus on what hardware actually needs

**Why Channel-Specific Masters?** - **Resource Optimization:** DMA engines are often unidirectional - **Port Reduction:** 39% for write-only, 61% for read-only - **Clarity:** Explicitly documents master capabilities - **Synthesis:** Simpler logic for dedicated paths

---

### *Documentation Organization*

This specification is organized into chapters:

- **Chapter 0:** Quick Start - Get running in 30 minutes
  - **Chapter 1 (this chapter):** Overview - Features, status, design philosophy
  - **Chapter 2:** Blocks - Micro-architecture details (adapters, crossbar, converters)
  - **Chapter 3:** Interfaces - Port specifications and parameters
  - **Chapter 4:** Programming - Generator usage, TOML format, signal naming
  - **Chapter 5:** Verification - Test suite, execution, troubleshooting
  - **Chapter 6:** Performance - Latency analysis, pipeline enhancement
  - **Appendix A:** Generator Deep Dive - Python internals for developers
  - **Appendix B:** Internals - Contributing, advanced development
- 

### *Comparison with Commercial IP*

#### **Bridge (Open Source) vs. Commercial AXI4 Crossbars:**

| Feature              | Commercial       | <b>Bridge</b>    |
|----------------------|------------------|------------------|
| <b>Complexity</b>    | Feature-complete | Pragmatic subset |
| <b>ID Width</b>      | Configurable     | 8-bit (fixed)    |
| <b>Address Width</b> | Configurable     | 64-bit (fixed)   |

| Feature               | Commercial          | <b>Bridge</b>              |
|-----------------------|---------------------|----------------------------|
| <b>Data Width</b>     | Configurable        | Configurable (32-512b)     |
| <b>Cost</b>           | \$\$\$ License fees | <b>Free (open source)</b>  |
| <b>Customization</b>  | Limited             | <b>Full Python source</b>  |
| <b>Learning Curve</b> | Steep               | <b>Quick Start: 30 min</b> |
| <b>Verification</b>   | Commercial          | <b>CocoTB + pytest</b>     |

**When to Use Bridge:** - Early prototyping and evaluation - Educational purposes and learning - Projects with typical configurations (not edge cases) - Open-source or cost-sensitive projects - Need full control over generator

**When to Use Commercial:** - Need every AXI4 spec feature - Require vendor support and warranty - Variable ID width per port is critical - Legacy compatibility requirements

---

#### *Version History*

| Version    | Date       | Milestone | Changes                             |
|------------|------------|-----------|-------------------------------------|
| <b>1.0</b> | 2025-10-19 | Phase 1   | CSV configuration, basic generation |
| <b>1.1</b> | 2025-10-26 | Phase 2   | Channel-specific masters (wr/rd/rw) |
| <b>1.2</b> | 2025-11-10 | Phase 2+  | Bridge ID tracking, OOO support     |
| <b>2.0</b> | TBD        | Phase 3   | APB converter implementation        |

## *Related Components*

**Within rtldesignsherpa Project:** - **APB Crossbar** - Simple register bus crossbar - **Delta (AXI-Stream)** - Streaming data crossbar - **HPET** - High precision event timer (APB peripheral) - **RAPIDS** - DMA engine with AXI4 masters

**External Dependencies:** - **axi4\_slave\_wr/rd** - Timing isolation wrappers - **skid\_buffer** - Backpressure buffering - **bridge\_cam** - CAM for bridge ID tracking

---

## *Getting Started*

**New Users:** 1. Start with [Chapter 0: Quick Start](#) 2. Generate your first bridge in 5 minutes 3. Run simulations to verify 4. Integrate into your design

**Experienced Users:** - [Chapter 2: Blocks](#) - Understand micro-architecture - [Chapter 4: Programming](#) - Advanced configuration - [Appendix A](#) - Generator internals

---

**Next:** [Chapter 1.2 - Architecture](#)

---

**Task:** Python GUI for TOML Configuration **Priority:** Medium **Effort:** 2-3 weeks

**Purpose:** Prevent illegal configurations, provide visual editor **Features:** - Master/slave port configuration with validation - Address map visualization (prevent overlaps) - Connectivity matrix editor - Real-time TOML preview - Configuration templates - Export to TOML/CSV for generator

## [\*\*Bridge - Architecture\*\*](#)

### [\*High-Level Architecture\*](#)

The Bridge component implements a **full M×N AXI4 crossbar** with address-based routing and ID-based response demultiplexing. The architecture consists of three main layers: adapters (per-master), crossbar (interconnect matrix), and protocol/width converters.

**Design Philosophy:** - **Per-Master Adapters:** Independent data paths for each master - **Shared Crossbar:** Address-based route selection and arbitration -

**Minimal Conversion:** Direct paths for matching widths, converters only where needed - **Pipeline Flexibility:** Configurable skid buffers and optional FIFOs

---

### Block Diagram



#### Adapter Functions:

- Timing isolation
- Width conversion (32→64)
- Address decode
- Bridge ID assignment

#### Crossbar Functions:

- Address decode
- Arbitration (per slave)
- ID-based routing
- Width conversion (64→slave)

**Key Observation:** Internal 64-bit data path simplifies logic while supporting variable-width masters and slaves.

---

### Module Hierarchy

```
bridge_top.sv (Top-Level)
  └── bridge_pkg.sv (Package - Types and Structures)
    ├── AXI4 channel types (axi4_ar_t, axi4_r_t, etc.)
    ├── Data structures (64-bit internal width)
    └── Bridge ID definitions

  └── master_adapters/ (One per Master)
    ├── cpu_master_adapter.sv
      ├── axi4_slave_rd (Timing isolation)
      ├── axi4_slave_wr (Timing isolation)
      ├── Address decoder (Which slave?)
      ├── Width converter (32→64 if needed)
      └── Bridge ID injector (Master→bridge_id)

    └── gpu_master_adapter.sv
    └── dma_master_adapter.sv
```

```

bridge_xbar.sv (Crossbar Interconnect)
  └── Arbitration logic (per-slave, per-channel)
    ├── AW channel arbiters
    └── AR channel arbiters

  └── Channel multiplexers (master selection)
    ├── AW/W/AR channel muxes
    └── B/R channel demuxes (ID-based)

  └── Slave adapters (per-slave)
    ├── bridge_cam.sv (Bridge ID → Master ID tracking)
    ├── Width converter (64→slave width)
    └── Protocol converter (AXI4→APB if needed)

  └── Backpressure management
    ├── Skid buffers (configurable depth)
    └── Optional FIFOs (for deep pipelining)

Dependencies
  └── rtl/amba/axi4_slave_rd.sv (Read channel wrapper)
  └── rtl/amba/axi4_slave_wr.sv (Write channel wrapper)
  └── rtl/common/skid_buffer.sv (Backpressure buffer)
  └── rtl/bridge/bridge_cam.sv (CAM for ID tracking)

```

---

### *Adapter Architecture (Per-Master)*

**Purpose:** Isolate master from crossbar, perform width/address conversion

#### **CPU Master Adapter Example:**

CPU Master (External 32-bit)

- [AXI4 Slave Wrapper] → Timing isolation, protocol compliance  
(`axi4_slave_rd/wr`)
- [Address Decoder] → Generate `slave_select` signals  
`base_addr`/`size` check      `slave_select_ar[N-1:0]`  
                                        `slave_select_aw[N-1:0]`
- [Width Converter] → 32-bit → 64-bit conversion  
(if master ≠ 64-bit)      Upsizing logic
- [Bridge ID Injector] → Add master's unique `bridge_id`  
`BRIDGE_ID` parameter      to every transaction
- [64-bit Internal Path] → To crossbar

```

cpu_master_64b_ar
cpu_master_64b_r
cpu_master_64b_aw
cpu_master_64b_w
cpu_master_64b_b

```

### Key Adapter Signals:

| Signal                   | Width  | Direction | Purpose                     |
|--------------------------|--------|-----------|-----------------------------|
| {master}_m_ax_i_*        | Native | Input     | External master interface   |
| slave_select_s_ar[S-1:0] |        | Output    | Read address decode result  |
| slave_select_s_aw[S-1:0] |        | Output    | Write address decode result |
| {master}_64b_ar          | Struct | Output    | 64-bit internal read addr   |
| {master}_64b_r           | Struct | Input     | 64-bit internal read data   |
| {master}_64b_aw          | Struct | Output    | 64-bit internal write addr  |
| {master}_64b_w           | Struct | Output    | 64-bit internal write data  |
| {master}_64b_b           | Struct | Input     | 64-bit internal write resp  |

### Crossbar Architecture

**Purpose:** Route transactions between adapters and slaves based on address decode and arbitration

### Crossbar Structure:





Response Path (ID-based routing):



**Arbitration Per Slave:** - Each slave has independent AW and AR arbiters - Round-robin policy (fair access, no starvation) - Grant locked during burst (until xlast) - Response channels routed by bridge\_id (no arbitration needed)

---

### Signal Flow: End-to-End

#### Read Transaction Example (CPU → DDR):

1. CPU issues read request:  
 $\text{cpu\_m\_axi\_araddr} = 0x10000000$   
 $\text{cpu\_m\_axi\_arlen} = 7$  (8-beat burst)
2. CPU Adapter processes:
  - Address decoder:  $0x10000000$  is in DDR range  
 $\rightarrow \text{slave\_select\_ar[DDR]} = 1$
  - Width converter: 32-bit → 64-bit (if needed)
  - Bridge ID injector: Adds  $\text{bridge\_id}=0$  (CPU's unique ID)
  - Outputs:  $\text{cpu\_master\_64b\_ar}$  (to crossbar)
3. Crossbar processes:
  - AR arbiter (DDR slave): Grants to CPU master
  - AR mux: Routes  $\text{cpu\_master\_64b\_ar} \rightarrow \text{ddr\_s\_axi\_ar}$
  - Width converter: 64-bit → 64-bit (no conversion)
  - Transaction reaches DDR controller
4. DDR responds:
  - DDR sends 8 beats:  $r_0, r_1, \dots, r_7$
  - Each beat has:  $\text{rid} = \text{original ARID}, \text{bridge\_id} = 0$
5. Crossbar returns response:
  - R demux: Looks up  $\text{bridge\_id}=0$  in CAM  
 $\rightarrow$  Maps to CPU master

- Routes: ddr\_s\_axi\_r → cpu\_master\_64b\_r
6. CPU Adapter returns:
- Width converter: 64-bit → 32-bit (if needed)
  - External: cpu\_m\_axi\_r (8 beats delivered)

**Total Latency:** 2-3 cycles (adapter + crossbar + mux)

---

### Address Decode Logic

#### Per-Master, Per-Transaction:

```
// Example: 3 slaves with different address ranges
always_comb begin
    slave_select_ar = '0; // Default: no slave selected

    if (cpu_m_axi_arvalid) begin
        // Slave 0: DDR (0x00000000 - 0x3FFFFFFF)
        if (cpu_m_axi_araddr >= 64'h00000000 &&
            cpu_m_axi_araddr < 64'h40000000)
            slave_select_ar[0] = 1'b1;

        // Slave 1: SRAM (0x40000000 - 0x4FFFFFFF)
        if (cpu_m_axi_araddr >= 64'h40000000 &&
            cpu_m_axi_araddr < 64'h50000000)
            slave_select_ar[1] = 1'b1;

        // Slave 2: APB (0x60000000 - 0x6000FFFF)
        if (cpu_m_axi_araddr >= 64'h60000000 &&
            cpu_m_axi_araddr < 64'h60010000)
            slave_select_ar[2] = 1'b1;
    end

    // If no match: DECERR will be returned
end
```

**Key Properties:** - Combinatorial (zero latency) - Per-channel (AW and AR independent) - Validates address ranges at generation time - Returns DECERR for unmapped addresses

---

### Arbitration Strategy

#### Per-Slave Round-Robin:

```
// AR channel arbiter for DDR slave
logic [NUM_MASTERS-1:0] ar_grant_ddr;
```

```

logic [NUM_MASTERS-1:0] ar_request_ddr;
logic [$clog2(NUM_MASTERS)-1:0] ar_grant_idx;

// Collect requests from all masters
assign ar_request_ddr = {
    dma_slave_select_ar[DDR] & dma_64b_arvalid,
    gpu_slave_select_ar[DDR] & gpu_64b_arvalid,
    cpu_slave_select_ar[DDR] & cpu_64b_arvalid
};

// Round-robin arbiter
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        ar_grant_idx <= '0';
    end else if (ar_request_ddr && ddr_s_axi_arready) begin
        // Grant next requester in round-robin order
        ar_grant_idx <= next_grant(ar_grant_idx, ar_request_ddr);
    end
    // Hold grant during burst (until RLAST)
end

// One-hot grant signal
assign ar_grant_ddr = (1 << ar_grant_idx);

```

**Arbitration Properties:** - Fair access (no master starves) - Grant locked during burst (AWLEN/ARLEN) - Independent per slave (no head-of-line blocking) - Separate AW and AR arbitration (concurrent read/write)

---

### Bridge ID Tracking

**Purpose:** Route responses back to correct master in multi-master systems

#### Mechanism:

Master Issues Request:

- Original AXI ID: 0x5
- Bridge adds: bridge\_id=2 (DMA master's unique ID)
- Sent to slave with both IDs

Slave Responds:

- Returns: RID=0x5, bridge\_id=2
- Crossbar CAM lookup: bridge\_id=2 maps to DMA master
- Response routed to DMA master
- DMA sees: RID=0x5 (original ID preserved)

#### CAM (Content Addressable Memory) in Slave Adapters:

```

// Per-slave CAM tracks: {trans_id, bridge_id} → master_id
logic [TRANS_ID_WIDTH-1:0] cam_trans_id [CAM_DEPTH-1:0];
logic [BRIDGE_ID_WIDTH-1:0] cam_bridge_id [CAM_DEPTH-1:0];
logic [MASTER_ID_WIDTH-1:0] cam_master_id [CAM_DEPTH-1:0];

// On AW/AR transaction: Store mapping
// On B/R response: Lookup master from bridge_id

```

**See Chapter 2.6: bridge\_cam.sv** for detailed CAM implementation.

---

### *Width Conversion Strategy*

#### **Adapter Level (Master → Internal 64-bit):**

32-bit master → Upsize to 64-bit

- Replicate data for writes
- Adjust WSTRB for byte enables
- Multiple internal beats per master beat (reads)

64-bit master → Direct connection

- Zero conversion overhead
- Pass-through logic

128-bit master → Downsize to 64-bit

- Split into multiple 64-bit beats
- Manage beat counters

#### **Crossbar Level (Internal 64-bit → Slave):**

64-bit → 32-bit slave: Downsize

- Split 64-bit beats into two 32-bit beats
- Manage beat sequencing

64-bit → 64-bit slave: Direct

- Zero overhead

64-bit → 128-bit slave: Upsize

- Combine two 64-bit beats
- Buffer and align

**Design Trade-off:** - Single 64-bit internal path simplifies crossbar - Width conversion pushed to edges (adapters and slave interfaces) - Most common widths (32, 64) have minimal conversion

---

## *Protocol Conversion (AXI4 → APB)*

**When Needed:** APB slave in mixed-protocol system

### **Conversion Adapter (Auto-Inserted):**

AXI4 Request → APB Translation:

AW/W channels → PADDR/PWDATA/PWRITE  
AR channel → PADDR/PWRITE=0

Wait for PREADY

APB Response → AXI4 Translation:

PRDATA → RDATA (reads)  
PSLVERR → RRESP/BRESP

**Limitations (APB Protocol):** - No burst support (AWLEN/ARLEN must be 0) -

Single-beat transactions only - Generator validates APB slaves don't receive bursts

**See Chapter 2.5: Protocol Converters** for detailed implementation.

---

## *Clock and Reset Architecture*

### **Single Clock Domain (Default):**

```
module bridge_top (
    input logic aclk,      // All logic uses this clock
    input logic aresetn,   // Active-low async reset
    // ...
);
```

**All components (adapters, crossbar, converters) share:** - Same clock: aclk -  
Same reset: aresetn - Simplest configuration, lowest latency

**Multiple Clock Domains (Future Enhancement):** - CDC at adapter inputs  
(master-side clock) - CDC at slave outputs (slave-side clocks) - Handshake  
synchronization - Adds 2-4 cycles latency per crossing

**Reset Behavior:** - All registers clear to known state - Arbiters reset to master 0 priority - CAMs clear all entries - Skid buffers flush - Ready for transactions after 10 cycles

---

## *Performance Paths*

### **Critical Timing Paths:**

1. **Address Decode → Arbiter → Mux**
  - Combinatorial address comparison
  - Arbiter grant logic
  - Mux selection
  - Target: 1 cycle
2. **Adapter Width Conversion**
  - Data path multiplexing
  - Beat counter logic
  - Target: 1 cycle
3. **CAM Lookup → Response Demux**
  - Parallel CAM search
  - Master selection mux
  - Target: 1 cycle

**Optimization Strategies:** - Register arbiter grants (1 cycle latency) - Pipeline address decode (if needed) - Add skid buffers (decouple timing) - Optional FIFOs for deep pipeline (>400 MHz)

**See Chapter 6: Performance** for detailed analysis and pipeline options.

---

## *Scalability Considerations*

**Small Systems (2×2, 4×4):** - Minimal resource usage (~2-5K LUTs) - High Fmax (400+ MHz) - Simple routing

**Medium Systems (8×8, 16×16):** - Moderate resources (~10-20K LUTs) - Good Fmax (300-350 MHz) - May need pipeline stages

**Large Systems (32×256):** - Significant resources (~50-100K LUTs) - Requires pipelining (250-300 MHz) - Consider hierarchical crossbar - Address decode becomes complex

**Recommendation:** For >16 masters or >32 slaves, consider hierarchical topology with multiple bridge instances.

---

### *Design Trade-offs*

| Decision                        | Pros                               | Cons                    |
|---------------------------------|------------------------------------|-------------------------|
| <b>64-bit Internal Path</b>     | Simple logic, common width         | Conversion for <64-bit  |
| <b>Per-Master Adapters</b>      | Independent paths, parallel decode | More resources          |
| <b>Round-Robin Arbitration</b>  | Fair, no starvation                | Not QoS-aware           |
| <b>Bridge ID Tracking</b>       | Supports OOO, multi-master         | CAM resources per slave |
| <b>Hard Limits (ID/Addr)</b>    | Simpler logic, faster              | Less flexible           |
| <b>Channel-Specific Masters</b> | Resource optimization              | Config complexity       |

### *Future Architecture Enhancements*

**Planned:** - Hierarchical crossbar for >32 masters - Weighted arbitration (QoS support) - Multiple internal data widths (configurable) - Deeper pipeline stages (>400 MHz target) - Optional CDC at boundaries

**Under Consideration:** - ACE protocol support (cache coherency) - AXI4-Lite variant (register-only crossbar) - Performance monitoring integration - Dynamic power gating per path

---

**Next:** [Chapter 1.3 - Clocks and Reset](#)

## **Bridge - Clocks and Reset**

### *Overview*

The Bridge component operates in a **single clock domain** by default, using a single AXI4-compatible clock (`aclk`) and active-low asynchronous reset (`aresetn`). This chapter describes clock domain strategies, reset behavior, timing requirements, and future CDC (Clock Domain Crossing) support.

**Default Configuration:** Single clock, simplest integration, lowest latency.

---

## Clock Domains

### Default: Single Clock Domain

```
module bridge_top (
    input logic          aclk,           // AXI4 clock
    input logic          aresetn,        // AXI4 active-low reset

    // All master interfaces
    input logic [3:0]    cpu_m_axi_awid,
    // ... all masters use aclk

    // All slave interfaces
    output logic [3:0]   ddr_s_axi_awid,
    // ... all slaves use aclk
);

    // All internal logic synchronous to aclk
    // - Adapters
    // - Crossbar
    // - Arbiters
    // - CAMs
    // - Skid buffers
```

**Key Properties:** - **Latency:** Minimal (2-3 cycles) - **Complexity:** Simple, no CDC logic - **Use Case:** All masters and slaves in same clock domain - **Fmax:** 300-400 MHz (typical on UltraScale+)

---

## Clock Signal Specifications

### Primary Clock: aclk

| Parameter         | Specification    | Notes                         |
|-------------------|------------------|-------------------------------|
| <b>Name</b>       | aclk             | AXI4 standard naming          |
| <b>Type</b>       | Single-ended     | No differential clock support |
| <b>Frequency</b>  | 50 MHz - 500 MHz | Typical: 200-400 MHz          |
| <b>Duty Cycle</b> | $50\% \pm 5\%$   | Standard requirement          |
| <b>Jitter</b>     | < 100 ps         | For high-frequency operation  |

| Parameter     | Specification | Notes                              |
|---------------|---------------|------------------------------------|
| <b>Domain</b> | Global        | All bridge logic shares this clock |

**Clock Routing Recommendations:** - Use dedicated clock buffers (BUFG on Xilinx) - Minimize clock skew across large designs - Consider clock tree synthesis for >200 MHz - Monitor clock utilization in synthesis reports

---

### *Reset Signal Specifications*

#### **Primary Reset: aresetn**

| Parameter       | Specification                                | Notes                                              |
|-----------------|----------------------------------------------|----------------------------------------------------|
| <b>Name</b>     | aresetn                                      | AXI4 standard naming                               |
| <b>Polarity</b> | Active-low                                   | Reset when aresetn = 0                             |
| <b>Type</b>     | Asynchronous assert<br>Synchronous de-assert | Can assert anytime<br>Must de-assert on clock edge |
| <b>Duration</b> | $\geq 10$ clock cycles                       | Minimum reset assertion                            |
| <b>Edge</b>     | De-assert on rising edge                     | Synchronized to aclk                               |

#### **Reset Behavior:**

```
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        // Async reset - enters immediately
        state <= IDLE;
        counter <= '0;
        valid_reg <= 1'b0;
    end else begin
        // Normal operation - synchronous to aclk
        // ...
    end
end
```

---

## *Reset Sequence*

### **Recommended Reset Sequence:**

1. Assert `aresetn = 0`
  - └ All registers clear to reset values
  - └ Arbiters reset to master 0 priority
  - └ CAMs flush all entries
  - └ Skid buffers empty
2. Hold for  $\geq 10$  clock cycles
  - └ Ensure all flip-flops have settled
3. De-assert `aresetn = 1` (on rising `aclk` edge)
  - └ Synchronized release across all logic
4. Wait 2-3 cycles
  - └ Allow internal state machines to initialize
5. Bridge ready for transactions
  - └ All \*\_ready signals now respond correctly

### **Timing Diagram:**

`aclk:`    \_|~|\_~|\_~|\_~|\_~|\_~|\_~|\_~|\_~|\_~|\_~|  
`aresetn:`    \_\_\_\_\_|\_|\_\_\_\_\_  
                  <reset> <- 10 cycles -><ready>  
`Internal:` XXXXX    RESET STATE    IDLE    → READY

---

## *Reset State Values*

### **All Registers Reset To:**

| Component       | Register         | Reset Value | Purpose             |
|-----------------|------------------|-------------|---------------------|
| <b>Adapters</b> | Address decoder  | Inactive    | No slave selected   |
|                 | Valid/ready regs | 0           | No transactions     |
|                 | Width converter  | 0           | Counter cleared     |
| <b>Crossbar</b> | Arbiter grant    | Master 0    | Deterministic start |
|                 | Arbiter          | 0           | Reset to first      |

| Component           | Register         | Reset Value | Purpose                |
|---------------------|------------------|-------------|------------------------|
| <b>CAMs</b>         | priority         |             | master                 |
|                     | Mux select       | 0           | Default select         |
|                     | All entries      | Invalid     | No active transactions |
| <b>Skid Buffers</b> | Entry valid bits | 0           | All entries free       |
|                     | Data registers   | 0 (or X)    | Empty state            |
|                     | Valid bits       | 0           | No stored data         |
|                     | Count            | 0           | Buffer empty           |

### Post-Reset Verification:

```
// After reset, bridge should be in known state:  
assert property (@(posedge aclk) $rose(aresetn) | -> ##3 bridge_idle);  
  
// All arbiters should grant master 0 initially  
assert property (@(posedge aclk) $rose(aresetn) | -> ##2 (ar_grant[DDR]  
== 3'b001));
```

---

### Clock Frequency Recommendations

#### By Bridge Configuration:

| Configuration | Masters×Slaves | Recommended |                          |
|---------------|----------------|-------------|--------------------------|
|               |                | Fmax        | Notes                    |
| <b>Small</b>  | 2×2, 4×4       | 400+ MHz    | Minimal logic depth      |
| <b>Medium</b> | 8×8, 16×16     | 300-350 MHz | May need skid buffers    |
| <b>Large</b>  | 32×256         | 250-300 MHz | Requires pipeline stages |

#### By FPGA Family:

| FPGA                      | Typical Fmax | Optimization Notes          |
|---------------------------|--------------|-----------------------------|
| <b>Xilinx UltraScale+</b> | 400 MHz      | Use BUFGCE for clock gating |
| <b>Xilinx 7-Series</b>    | 300 MHz      | Consider pipeline           |

| FPGA                    | Typical Fmax | Optimization Notes                |
|-------------------------|--------------|-----------------------------------|
| <b>Intel Stratix 10</b> | 450 MHz      | Excellent for crossbars<br>stages |
| <b>Intel Cyclone V</b>  | 250 MHz      | May need relaxed timing           |

**Fmax Optimization:** - Add skid buffers to break combinatorial paths - Register arbiter outputs (+1 cycle latency) - Pipeline address decode for large slave counts - See Chapter 6: Performance for deep pipeline options

---

### *Multiple Clock Domain Support (Future)*

#### **Planned Enhancement: Optional CDC**

```
module bridge_top #(
    parameter CDC_ENABLE = 0      // 0=single clock, 1=cdc enabled
) (
    // Master-side clock domain
    input logic          master_aclk,
    input logic          master_aresetn,

    // Slave-side clock domain (if CDC_ENABLE=1)
    input logic          slave_aclk,
    input logic          slave_aresetn,

    // Master interfaces (master_aclk domain)
    input logic [3:0]    cpu_m_axi_awid,
    // ...

    // Slave interfaces (slave_aclk domain if CDC_ENABLE=1)
    output logic [3:0]   ddr_s_axi_awid,
    // ...
);

```

#### **CDC Strategy (When Implemented):**

| Master Domain (fast) | CDC Boundary              | Slave Domain (slow) |
|----------------------|---------------------------|---------------------|
| Masters → Adapters   | AsyncFIFO per AXI channel | Slave Logic         |
| aclk=400MHz          | ↔ Gray code               | ↔ aclk=200MHz       |

counters  
Handshake

**CDC Specifications (Future):** - Handshake-based synchronization - Independent clock frequencies - Gray code counters for pointer crossing - 2-4 cycle latency penalty - Configurable FIFO depth per channel

**Use Cases for CDC:** - Fast CPU clock, slow memory clock - Independent peripheral clock domains - Power-optimized slave domains - Legacy IP with fixed clocks

**Status:** CDC support is planned but not yet implemented. Current version requires single clock domain.

---

### *Clock Domain Crossing Considerations*

**If You Need CDC Today:**

Use external CDC components between bridge and slaves:

```
// Bridge in fast domain
bridge_top u_bridge (
    .aclk      (fast_clk),      // 400 MHz
    .aresetn  (fast_rst_n),
    // ... master interfaces (fast_clk)
    // ... slave interfaces output (fast_clk)
    .ddr_s_axi_*(ddr_fast_*),   // Fast clock domain
);

// External CDC FIFO
axi_async_fifo u_cdc (
    .s_aclk      (fast_clk),
    .s_aresetn  (fast_rst_n),
    .s_axi_*     (ddr_fast_*),   // From bridge
    .m_aclk      (slow_clk),     // 200 MHz
    .m_aresetn  (slow_rst_n),
    .m_axi_*     (ddr_slow_*),   // To DDR controller
);

// DDR controller in slow domain
ddr_controller u_ddr (
    .clk      (slow_clk),
    .rst_n   (slow_rst_n),
    .s_axi_* (ddr_slow_*),
);
```

**Available CDC Components:** - Xilinx AXI4 Clock Converter IP - Intel Avalon Clock Crossing Bridge - Custom async FIFO with AXI4 wrappers

---

### *Power Considerations*

#### **Clock Gating (Future Enhancement):**

```
// Per-adapter clock gating (not yet implemented)
logic cpu_adapter_clk_en;
assign cpu_adapter_clk_en = cpu_m_axi_arvalid |
                           cpu_m_axi_awvalid |
                           internal_busy;

BUFGCE cpu_clk_gate (
    .I(aclk),
    .CE(cpu_adapter_clk_en),
    .O(cpu_adapter_gated_clk)
);
```

**Dynamic Voltage and Frequency Scaling (DVFS):** - Not currently supported - Would require CDC between domains - Future consideration for low-power designs

---

### *Timing Constraints (SDC Example)*

#### **For Synthesis:**

```
# Create primary clock
create_clock -period 2.500 -name aclk [get_ports aclk]

# Set input delay (relative to aclk)
set_input_delay -clock aclk -max 0.500 [get_ports {*_m_axi_*}]
set_input_delay -clock aclk -min 0.100 [get_ports {*_m_axi_*}]

# Set output delay (relative to aclk)
set_output_delay -clock aclk -max 0.500 [get_ports {*_s_axi_*}]
set_output_delay -clock aclk -min 0.100 [get_ports {*_s_axi_*}]

# Reset is asynchronous
set_false_path -from [get_ports aresetn]

# Multicycle paths (if using pipeline stages)
set_multicycle_path -setup 2 -through [get_pins -hier *_skid_*]
set_multicycle_path -hold 1 -through [get_pins -hier *_skid_*]
```

---

## Clock Quality Requirements

### For Reliable Operation:

| Metric             | Requirement   | Impact if Violated                 |
|--------------------|---------------|------------------------------------|
| Jitter             | < 100 ps RMS  | Timing violations, data corruption |
| Duty Cycle         | 45%-55%       | Setup/hold issues                  |
| Skew               | < 200 ps      | Metastability risk                 |
| Frequency Accuracy | $\pm 100$ ppm | Generally not critical for logic   |

**Clock Source Recommendations:** - PLL-generated clocks (most FPGA designs) - Crystal oscillator (for standalone systems) - Clock from master IP (if AXI master provides clock)

---

## Reset Distribution

### Synchronizer for Asynchronous Reset:

```
// Reset synchronizer (good practice)
logic aresetn_sync_1, aresetn_sync_2;

always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        aresetn_sync_1 <= 1'b0;
        aresetn_sync_2 <= 1'b0;
    end else begin
        aresetn_sync_1 <= 1'b1;
        aresetn_sync_2 <= aresetn_sync_1;
    end
end

// Use synchronized reset throughout design
wire aresetn_internal = aresetn_sync_2;
```

**Why Synchronize?** - Prevents metastability on reset de-assertion - Ensures all flops see same reset release time - Required for proper operation at high frequencies

**Reset Tree Considerations:** - Use reset buffers for large designs - Minimize reset fanout (< 10K flops per driver) - Consider hierarchical reset trees

---

## *Debugging Clock and Reset Issues*

### **Common Problems:**

#### **Problem 1: Bridge Not Responding After Reset**

Symptoms: \*\_ready signals stay low

Check:

- Reset held for  $\geq 10$  cycles?
- aresetn synchronized to aclk edge?
- Clock actually toggling (scope)?

#### **Problem 2: Random Data Corruption**

Symptoms: Intermittent errors, not reproducible

Check:

- Clock jitter (should be <100ps)
- Clock duty cycle (should be 45-55%)
- Timing violations (setup/hold)
- CDC issues if using multiple clocks

#### **Problem 3: Synthesis Timing Violations**

Symptoms: Synth reports failing paths

Solutions:

- Reduce target Fmax
  - Add pipeline stages (skid buffers)
  - See Chapter 6: Performance
- 

## *Clock and Reset Checklist*

Before integration:

- Single clock source (aclk) connected to all bridge logic
- Clock frequency within recommended range
- Clock duty cycle 45-55%
- Reset signal (aresetn) is active-low
- Reset held for  $\geq 10$  cycles
- Reset de-asserted synchronously (on aclk rising edge)
- Reset synchronizer implemented (recommended)
- Timing constraints applied (SDC file)
- No clock domain crossings (or external CDC FIFOs used)
- Post-synthesis timing clean (no violations)

---

## *Future Enhancements*

**Planned:** - Native CDC support (CDC\_ENABLE parameter) - Per-adapter clock gating for power - Multiple slave clock domains - Automatic clock relationship constraints

**Under Consideration:** - DVFS support - Clock monitoring/failover - Asynchronous reset removal (fully synchronous design)

---

**Next:** [Chapter 1.4 - Acronyms](#)

## **1.4 Acronyms and Abbreviations**

This section provides definitions for acronyms and abbreviations used throughout the Bridge documentation.

### **A**

#### **AHB**

Advanced High-performance Bus - AMBA bus protocol for high-performance systems

#### **APB**

Advanced Peripheral Bus - AMBA bus protocol for low-power peripheral interfaces

#### **AMBA**

Advanced Microcontroller Bus Architecture - ARM's on-chip communication standard

#### **ARADDR**

AXI Read Address - Address field in AXI read address channel

#### **ARBURST**

AXI Read Burst Type - Burst type indicator for read transactions

#### **ARCACHE**

AXI Read Cache - Cache attributes for read transactions

**ARID**

AXI Read ID - Transaction identifier for read address channel

**ARLEN**

AXI Read Length - Number of data transfers in read burst

**ARLOCK**

AXI Read Lock - Lock type for atomic read operations

**ARPROT**

AXI Read Protection - Protection attributes for read transactions

**ARQOS**

AXI Read Quality of Service - QoS identifier for read transactions

**ARREADY**

AXI Read Address Ready - Slave indicates readiness to accept read address

**ARSIZE**

AXI Read Size - Size of each transfer in read burst

**ARVALID**

AXI Read Address Valid - Master indicates valid read address/control

**AWADDR**

AXI Write Address - Address field in AXI write address channel

**AWBURST**

AXI Write Burst Type - Burst type indicator for write transactions

**AWCACHE**

AXI Write Cache - Cache attributes for write transactions

**AWID**

AXI Write ID - Transaction identifier for write address channel

**AWLEN**

AXI Write Length - Number of data transfers in write burst

**AWLOCK**

AXI Write Lock - Lock type for atomic write operations

**AWPROT**

AXI Write Protection - Protection attributes for write transactions

**AWQOS**

AXI Write Quality of Service - QoS identifier for write transactions

**AWREADY**

AXI Write Address Ready - Slave indicates readiness to accept write address

**AWSIZE**

AXI Write Size - Size of each transfer in write burst

**AWVALID**

AXI Write Address Valid - Master indicates valid write address/control

**AXI**

Advanced eXtensible Interface - High-performance protocol in AMBA family

**AXI4**

Advanced eXtensible Interface version 4 - Latest full AXI specification

**AXI4-Lite**

Simplified subset of AXI4 for control/status registers

**AXI4-Stream**

AXI4 variant optimized for streaming data flows

**B****BID**

AXI Write Response ID - Transaction identifier for write response channel

**BREADY**

AXI Write Response Ready - Master indicates readiness to accept write response

**BRESP**

AXI Write Response - Response status for write transaction

**BVALID**

AXI Write Response Valid - Slave indicates valid write response

**C****CAM**

Content Addressable Memory - Associative memory used for ID tracking

**CDC**

Clock Domain Crossing - Circuitry for signals crossing clock boundaries

**CLOG2**

Ceiling Log Base 2 - Function to calculate minimum bit width

**CSV**

Comma-Separated Values - Configuration file format option

**D****DMA**

Direct Memory Access - Hardware subsystem for autonomous data transfers

**DRC**

Design Rule Check - Automated verification of design constraints

**DUT**

Device Under Test - Component being verified in testbench

**E****ECC**

Error Correction Code - Algorithm for detecting/correcting bit errors

**F****FIFO**

First-In-First-Out - Queue data structure for buffering

**FSM**

Finite State Machine - Sequential logic controller

**G****GUI**

Graphical User Interface - Visual tool for configuration management

**H****HPET**

High Precision Event Timer - Reference specification for documentation format

**I****ID**

Identifier - Unique tag for tracking transactions

**ILA**

Integrated Logic Analyzer - FPGA debugging tool

**IP**

Intellectual Property - Reusable design component

**J****JSON**

JavaScript Object Notation - Data interchange format

**L****LFSR**

Linear Feedback Shift Register - Pseudo-random sequence generator

**LSB**

Least Significant Bit - Rightmost bit in binary representation

**LUT**

Look-Up Table - FPGA logic element

**M****MAS**

Micro-Architecture Specification - Detailed design documentation format

**MSB**

Most Significant Bit - Leftmost bit in binary representation

**MUX**

Multiplexer - Circuit selecting one of multiple inputs

**N****NOP**

No Operation - Idle state or null transaction

**O****OOO**

Out-Of-Order - Transactions completing in different order than issued

**OOR**

Out-Of-Range - Address outside valid address space

## P

### **PDF**

Portable Document Format - Document file format for distribution

### **PRDATA**

APB Read Data - Data field in APB read transaction

### **PRD**

Product Requirements Document - High-level specification document

### **PREADY**

APB Ready - Slave indicates completion of transfer

### **PSEL**

APB Select - Slave selection signal

### **PSLVERR**

APB Slave Error - Error response signal

### **PWDATA**

APB Write Data - Data field in APB write transaction

## Q

### **QoS**

Quality of Service - Priority or performance level indicator

## R

### **RDATA**

AXI Read Data - Data field in AXI read data channel

### **RID**

AXI Read Response ID - Transaction identifier for read data channel

### **RLAST**

AXI Read Last - Indicates final transfer in read burst

### **RREADY**

AXI Read Data Ready - Master indicates readiness to accept read data

### **RRESP**

AXI Read Response - Response status for read data transfer

**RTL**

Register Transfer Level - Hardware description abstraction level

**RVALID**

AXI Read Data Valid - Slave indicates valid read data

**S****SDC**

Synopsys Design Constraints - Timing constraint file format

**SoC**

System-on-Chip - Integrated circuit containing complete system

**STA**

Static Timing Analysis - Method for verifying timing requirements

**SV**

SystemVerilog - Hardware description and verification language

**T****TCAM**

Ternary Content Addressable Memory - CAM with don't-care states

**TOML**

Tom's Obvious Minimal Language - Configuration file format

**U****UVM**

Universal Verification Methodology - SystemVerilog verification framework

**V****VCD**

Value Change Dump - Waveform file format

**VIP**

Verification IP - Reusable verification component

**W****WDATA**

AXI Write Data - Data field in AXI write data channel

**WLAST**

AXI Write Last - Indicates final transfer in write burst

**WREADY**

AXI Write Data Ready - Slave indicates readiness to accept write data

**WSTRB**

AXI Write Strobe - Byte-lane write enable signals

**WVALID**

AXI Write Data Valid - Master indicates valid write data

**X****XBAR**

Crossbar - Interconnect allowing any master to access any slave

**xdc**

Xilinx Design Constraints - Xilinx timing constraint file format

---

## Bridge-Specific Terms

**Bridge ID (BID\_WIDTH)**

Internal identifier added to transactions for routing responses back to originating master

**Channel-Specific Master**

Master port specialized for read-only, write-only, or read-write operations

**Master Adapter**

Per-master module providing timing isolation and protocol conversion

**Round-Robin Arbitration**

Fair scheduling algorithm cycling through requesters

**Skid Buffer**

Pipeline register allowing single-cycle backpressure response

**Slave Router**

Module decoding addresses and routing requests to slaves

**Width Conversion**

Logic adapting between different data bus widths

---

**Note:** Signal names from the AXI4 specification (AR, AW, W, B, R\*) are shown in uppercase as they typically appear in RTL. However, when used in SystemVerilog code they may appear in lowercase or with prefixes/suffixes depending on naming conventions.

## 1.5 References

This section provides references to external specifications, tools, and related documentation that support the Bridge component.

### Specifications and Standards

#### AMBA AXI4 Specification

##### ARM IHI 0022E (ID022613)

AMBA AXI and ACE Protocol Specification AXI3, AXI4, and AXI4-Lite, ACE and ACE-Lite

Publisher: ARM Limited

URL: <https://developer.arm.com/documentation/ihi0022/latest/>

This is the authoritative specification for the AXI4 protocol implemented by the Bridge. Key sections: - Chapter A2: Signal Descriptions - Chapter A3: Single Interface Requirements - Chapter A5: Transaction Attributes - Chapter A7: Interconnect and Ordering

#### AMBA APB Specification

##### ARM IHI 0024C

AMBA APB Protocol Specification

Publisher: ARM Limited

URL: <https://developer.arm.com/documentation/ihi0024/latest/>

Referenced for APB slave conversion functionality.

#### SystemVerilog IEEE Standard

##### IEEE Std 1800-2017

IEEE Standard for SystemVerilog - Unified Hardware Design, Specification, and Verification Language

Publisher: IEEE

The Bridge RTL is generated as SystemVerilog IEEE 1800-2017 compliant code.

## Tools and Frameworks

### Verilator

Open-source SystemVerilog simulator and lint tool

URL: <https://www.veripool.org/verilator/>

Documentation: <https://verilator.org/guide/latest/>

Verilator Version: 5.006 or later recommended

Used for: RTL simulation, lint checking

### CocoTB

Coroutine-based cosimulation testbench environment for verifying VHDL and SystemVerilog RTL

URL: <https://www.cocotb.org/>

Documentation: <https://docs.cocotb.org/>

GitHub: <https://github.com/cocotb/cocotb>

CocoTB Version: 1.8.0 or later

Used for: Bridge verification framework, test development

### Python

Python Programming Language

URL: <https://www.python.org/>

Documentation: <https://docs.python.org/3/>

Python Version: 3.8 or later required

Used for: Bridge generator, test infrastructure, automation scripts

### Make

GNU Make - Build automation tool

URL: <https://www.gnu.org/software/make/>

Documentation: <https://www.gnu.org/software/make/manual/>

Used for: Build system, test execution, automation

## **TOML**

Tom's Obvious Minimal Language

URL: <https://toml.io/>

Specification: <https://toml.io/en/v1.0.0>

Configuration file format for Bridge specifications

## **Jinja2**

Template engine for Python

URL: <https://palletsprojects.com/p/jinja/>

Documentation: <https://jinja.palletsprojects.com/>

Jinja2 Version: 3.0 or later

Used for: RTL template generation

## **Related RTL Components**

### **Stream Components**

**Location:** projects/components/stream/

**Documentation:** projects/components/stream/docs/stream\_spec/

Stream protocol components that may be integrated with Bridge masters/slaves for data streaming applications.

### **Common Utilities**

**Location:** projects/components/common/

Shared RTL utilities used across multiple projects, including:  
- FIFO implementations  
- Clock domain crossing modules  
- Synchronizers  
- Standard interfaces

### **AMBA Components**

**Location:** projects/components/amba/

Additional AMBA protocol components that complement the Bridge:  
- AXI monitors  
- AXI protocol checkers  
- APB components

# Project Documentation

## Global Requirements

**Location:** GLOBAL\_REQUIREMENTS.md

Project-wide coding standards, naming conventions, and design rules.

## Testing Guide

**Location:** TESTING.md

Project-level testing methodology and infrastructure documentation.

## Product Requirements Document

**Location:** PRD.md

High-level product requirements for the RTL Design Sherpa project.

## Design and Verification Resources

### AXI4 Transaction Ordering

ARM Developer: Understanding AXI4 Transaction Ordering

Useful for understanding ordering requirements in multi-master systems.

### CocoTB Verification Guide

CocoTB Documentation: Writing Testbenches

[https://docs.cocotb.org/en/stable/writing\\_testbenches.html](https://docs.cocotb.org/en/stable/writing_testbenches.html)

Essential reading for developing Bridge test cases.

### Python Async/Await

Python Documentation: Coroutines and Tasks

<https://docs.python.org/3/library/asyncio-task.html>

Background on async programming model used in CocoTB tests.

## FPGA Vendor Tools

### Xilinx Vivado

Xilinx Vivado Design Suite

URL: <https://www.xilinx.com/products/design-tools/vivado.html>

For synthesis, implementation, and timing analysis on Xilinx FPGAs.

Relevant constraint formats: - XDC (Xilinx Design Constraints) - SDC (Synopsys Design Constraints)

### **Intel Quartus**

Intel Quartus Prime

URL:

<https://www.intel.com/content/www/us/en/products/details/fpga/development-tools/quartus-prime.html>

For synthesis and implementation on Intel FPGAs.

### **Open Source Hardware**

#### **LibreCores**

Collaborative platform for open source digital hardware designs

URL: <https://www.libreccores.org/>

Community resource for reusable IP cores and design methodologies.

#### **FOSSi Foundation**

Free and Open Source Silicon Foundation

URL: <https://www.fossi-foundation.org/>

Promotes free and open digital hardware design.

### **Academic and Industry Papers**

#### **High-Performance Interconnects**

##### **“On-Chip Networks” by Natalie Enright Jerger and Li-Shiuan Peh**

Morgan & Claypool Publishers (2009)

Comprehensive coverage of NoC architectures and interconnect design.

#### **AXI Protocol Analysis**

Various whitepapers available from ARM and FPGA vendors discussing: - AXI4 performance optimization - Width conversion techniques - Clock domain crossing strategies - Arbitration algorithms

## **Conference Proceedings**

### **Design Automation Conference (DAC)**

Annual conference covering EDA tools and methodologies

URL: <https://www.dac.com/>

### **International Conference on Computer-Aided Design (ICCAD)**

Research in design automation and verification

URL: <https://iccad.com/>

## **Online Communities and Forums**

### **Stack Overflow - Verilog/SystemVerilog**

Questions and answers on HDL design

Tag: [systemverilog], [verilog], [axi]

### **Reddit - r/FPGA**

Community discussion on FPGA design and development

URL: <https://www.reddit.com/r/FPGA/>

### **Verification Academy**

Mentor Graphics verification training and resources

URL: <https://verificationacademy.com/>

## **Version Control**

### **Git**

Distributed version control system

URL: <https://git-scm.com/>

Documentation: <https://git-scm.com/doc>

Project uses Git for source control and collaboration.

## **Documentation Tools**

### **Markdown**

Lightweight markup language

Specification: <https://commonmark.org/>

Used for all Bridge documentation files.

## Pandoc

Universal document converter

URL: <https://pandoc.org/>

Recommended tool for converting Markdown to PDF for final documentation deliverables.

## Graphviz

Graph visualization software

URL: <https://graphviz.org/>

For generating architecture diagrams (future enhancement).

## Mermaid

JavaScript-based diagramming and charting tool

URL: <https://mermaid.js.org/>

For generating diagrams in Markdown (future enhancement).

## Bridge-Specific References

### Generator Architecture

**Location:** Appendix A - Generator Deep Dive

Detailed documentation on the Python-based RTL generation system.

### Verification Framework

**Location:** Chapter 5 - Verification

Comprehensive guide to the CocoTB-based test suite.

### Configuration Examples

**Location:** `projects/components/bridge/configs/`

Sample TOML/CSV configuration files for various Bridge topologies.

### Test Cases

**Location:** `val/bridge/`

Complete set of verification tests with documentation.

## Errata and Updates

For the latest updates, bug fixes, and errata:

**Project Repository:** Check commit history and issue tracker

**Internal Documentation:** Refer to release notes for version-specific changes

---

## Document Revision History

This references section is maintained as part of the Bridge MAS documentation. For questions or suggestions regarding additional references, consult the project maintainers.

**Note:** External URLs and specifications are subject to change. If a link is broken, consult the publisher's main website or use a web search engine to locate updated resources.

## 2.1 Master Adapter

The Master Adapter is a per-master module that provides timing isolation, protocol normalization, and request preparation for the crossbar interconnect. Each master port in the bridge has its own dedicated adapter instance.

### 2.1.1 Purpose and Function

The Master Adapter serves several critical functions:

1. **Timing Isolation:** Inserts pipeline registers (skid buffers) to break combinatorial paths from master to crossbar
2. **Channel Specialization:** Separates read-only, write-only, and read-write masters for optimal resource utilization
3. **Bridge ID Injection:** Adds internal tracking IDs to transactions for response routing
4. **Protocol Normalization:** Ensures all transactions meet crossbar assumptions and constraints
5. **Backpressure Management:** Handles ready/valid handshaking with single-cycle latency

## 2.1.2 Block Diagram



### Master Adapter Architecture

*Figure 2.1: Master Adapter architecture showing per-master skid buffers, bridge ID injection/stripping, and channel specialization for rd/wr/rw configurations.*

## 2.1.3 Channel Specialization

The Bridge generator creates three types of master adapters based on the channel usage specified in the configuration:

### Read-Only Masters

**Channels:** AR (Address Read), R (Read Data)

Configuration: "rd" or "read"

RTL Generated: adapter\_master\_rd\_<id>.sv

Optimizations:

- No write address channel logic
- No write data channel logic
- No write response channel logic
- Reduced arbitration participation (read path only)

### Write-Only Masters

**Channels:** AW (Address Write), W (Write Data), B (Write Response)

Configuration: "wr" or "write"

RTL Generated: adapter\_master\_wr\_<id>.sv

Optimizations:

- No read address channel logic
- No read data channel logic
- Reduced arbitration participation (write path only)

## Read-Write Masters

Channels: AR, R, AW, W, B (Full AXI4 interface)

Configuration: "rw" or "readwrite"

RTL Generated: adapter\_master\_rw\_<id>.sv

Full Functionality:

- All five AXI4 channels implemented
- Participates in both read and write arbitration
- Maximum flexibility but larger resource footprint

## 2.1.4 Skid Buffer Architecture

Each AXI4 channel passes through a skid buffer for timing isolation. The skid buffer implements:

### Registered Forward Path

```
// Simplified skid buffer concept
always_ff @(posedge clk) begin
    if (!rst_n) begin
        valid_reg <= 1'b0;
    end else if (!valid_reg || ready_out) begin
        valid_reg <= valid_in;
        data_reg  <= data_in;
    end
end
```

### Single-Cycle Backpressure Response

- When downstream is not ready, accepts one additional beat into holding register
- Signals ready\_in = 0 in same cycle to upstream
- No combinatorial paths between upstream and downstream

### Pipeline Depth

- Default: 1 stage (skid buffer)
- Configurable: Up to 8 stages for high-frequency designs
- Trade-off: Latency vs. timing closure

**Performance Impact:** - Adds 1 cycle latency per channel - Enables higher clock frequencies - Prevents critical paths through crossbar

## 2.1.5 Bridge ID Management

### ID Width Calculation

`BID_WIDTH = clog2(num_masters)`

Examples:

- 2 masters → `BID_WIDTH = 1`
- 4 masters → `BID_WIDTH = 2`
- 8 masters → `BID_WIDTH = 3`
- 16 masters → `BID_WIDTH = 4`

### ID Injection (Request Path)

For each master adapter, a unique constant Bridge ID is appended to the AXI transaction ID:

AR Channel:

```
Internal ARID = {MASTER_BID[BID_WIDTH-1:0], External  
ARID[ARID_WIDTH-1:0]}  
Internal ARID Width = BID_WIDTH + ARID_WIDTH
```

AW Channel:

```
Internal AWID = {MASTER_BID[BID_WIDTH-1:0], External  
AWID[AWID_WIDTH-1:0]}  
Internal AWID Width = BID_WIDTH + AWID_WIDTH
```

**Example:** 4-master system, external ARID\_WIDTH = 4

```
Master 0: BID = 2'b00, External ID = 4'h3 → Internal ID = 6'b00_0011  
Master 1: BID = 2'b01, External ID = 4'h3 → Internal ID = 6'b01_0011  
Master 2: BID = 2'b10, External ID = 4'h5 → Internal ID = 6'b10_0101  
Master 3: BID = 2'b11, External ID = 4'ha → Internal ID = 6'b11_1010
```

### ID Stripping (Response Path)

The Bridge ID is removed from responses before returning to the master:

R Channel:

```
External RID = Internal RID[ARID_WIDTH-1:0]  
Bridge routes based on Internal RID[BID_WIDTH+ARID_WIDTH-  
1:ARID_WIDTH]
```

B Channel:

```
External BID = Internal BID[AWID_WIDTH-1:0]  
Bridge routes based on Internal BID[BID_WIDTH+AWID_WIDTH-  
1:AWID_WIDTH]
```

This ensures:

- Master sees original transaction IDs
- Bridge internally tracks which master originated each transaction
- Responses route back to correct master even with ID reuse across masters

## 2.1.6 Protocol Normalization

The Master Adapter enforces several protocol requirements:

### Valid Address Ranges

- Checks addresses against slave address maps
- Flags out-of-range accesses for error handling
- Prevents deadlocks from illegal addresses

### Burst Constraints

- Validates ARLEN/AWLEN vs. 4KB boundary rules
- Ensures burst types are supported (FIXED, INCR, WRAP)
- Checks SIZE vs. DATA\_WIDTH compatibility

### Signal Defaults

- Provides default values for unused signals
- Ensures ARCACHE/AWCACHE have legal values
- Sets ARPROT/AWPROT based on configuration

## 2.1.7 Interface Specifications

### External Master Interface (per master)

Input Channels (from master):

- ARVALID, ARREADY, ARADDR, ARBURST, ARCACHE, ARID, ARLEN, ARLOCK, ARPROT, ARQOS, ARSIZE
- AWVALID, AWREADY, AWADDR, AWBURST, AWCACHE, AWID, AWLEN, AWLOCK, AWPROT, AWQOS, AWSIZE
- WVALID, WREADY, WDATA, WSTRB, WLAST

Output Channels (to master):

- RVALID, RREADY, RDATA, RID, RLAST, RRESP
- BVALID, BREADY, BID, BRESP

### Internal Crossbar Interface (per master)

Output Channels (to crossbar):

- AR\* signals + Internal ARID (wider)
- AW\* signals + Internal AWID (wider)

- W\* signals (unchanged)

Input Channels (from crossbar):

- R\* signals + Internal RID (wider)
- B\* signals + Internal BID (wider)

## 2.1.8 Resource Utilization

### Per-Master Adapter Resources (Typical)

#### Read-Write Master (64-bit data, 32-bit addr, 4-bit ID):

Logic Elements: ~200-400 (depending on ID width and features)

Registers: ~150-250 (pipeline stages + control)

Block RAM: 0 (CAM is in crossbar core)

Breakdown:

- Skid buffers (5 channels × ~30 regs each): ~150 regs
- ID manipulation logic: ~50 LEs
- Control FSMs: ~100 LEs
- Routing logic: ~50 LEs

**Read-Only Master:** ~60% of read-write resources

**Write-Only Master:** ~65% of read-write resources

### Scaling Considerations

Resource usage scales with:

- Number of masters (linear scaling, one adapter per master)
- ID width (logarithmic: +BID\_WIDTH bits per transaction)
- Pipeline depth (linear: deeper pipelines = more registers)
- Data width (linear: wider data = wider skid buffers)

## 2.1.9 Timing Characteristics

### Latency

**Request Path** (Master → Crossbar):

- Minimum: 1 cycle (skid buffer)
- With 4-stage pipeline: 4 cycles
- With ID injection: +0 cycles (combinatorial within stage)

**Response Path** (Crossbar → Master):

- Minimum: 1 cycle (skid buffer)
- With 4-stage pipeline: 4 cycles
- With ID stripping: +0 cycles (combinatorial within stage)

**Total Round-Trip Overhead:** 2-8 cycles (depending on pipeline depth)

### Throughput

- **Maximum:** 1 transaction per cycle per channel (fully pipelined)
- **Backpressure:** Propagates with 1-cycle latency

- **Burst Performance:** No degradation; bursts flow continuously when ready

## Critical Paths

Typical critical paths (if skid buffers not used): - Master ARVALID → Crossbar arbitration logic - Crossbar grant → Master ARREADY - Slave RDATA → Master RDATA

**Solution:** Skid buffers break all these paths at the cost of 1-cycle latency.

## 2.1.10 Configuration Parameters

### Per-Master Parameters (from TOML/CSV)

```
[[masters]]
name = "cpu"
channels = "rw"          # "rd", "wr", or "rw"
arid_width = 4            # External ID width
awid_width = 4            # Can differ from ARID
addr_width = 32           # Address bus width
data_width = 64           # Data bus width
pipeline_depth = 1         # Skid buffer stages (1-8)
```

### Global Parameters (affect all adapters)

```
[bridge]
internal_data_width = 64    # Crossbar data width
enable_width_conversion = true
bid_width = 2                # Calculated: clog2(num_masters)
```

## 2.1.11 Debug and Observability

### Recommended Debug Signals

For ILA or waveform capture:

- Master adapter input valid/ready (all channels)
- Master adapter output valid/ready (all channels)
- Bridge ID assignments per transaction
- Pipeline stage occupancy
- Backpressure events (ready = 0 while valid = 1)

### Common Issues and Debug

**Symptom:** Master hangs waiting for READY

**Check:** - Is adapter receiving grants from arbiters? - Is slave responding to requests? - Is response path clear?

**Symptom:** Incorrect data returned to master

**Check:** - Bridge ID extraction logic - ID width mismatches - Response routing in crossbar

**Symptom:** Timing violations

**Check:** - Increase pipeline\_depth parameter - Verify clock frequency vs. design complexity - Check for combinatorial loops

## 2.1.12 Future Enhancements

### Planned Features

- **Dynamic Pipeline Depth:** Adjust depth based on operating frequency
- **FIFO Mode:** Deeper buffering (16-256 entries) for burst-intensive masters
- **QoS Support:** Priority-based arbitration hints
- **Performance Counters:** Transaction counts, stall cycles, utilization metrics

### Under Consideration

- **Clock Domain Crossing:** Per-master clock domains with async FIFOs
- **Width Conversion at Adapter:** Move width logic to adapters for distributed conversion
- **Outstanding Transaction Tracking:** Windowing for improved OOO performance

---

**Related Sections:** - Section 2.3: Crossbar Core (interconnect architecture) - Section 2.4: Arbitration (how adapters compete for slaves) - Section 2.5: ID Management (CAM structures for response routing) - Section 3.2: Master Port Interface (signal-level specifications)

## 2.2 Slave Router

The Slave Router is responsible for address decoding and routing master requests to the appropriate slave. Each master has its own dedicated slave router instance that examines addresses and directs transactions to one of the configured slaves or generates error responses for out-of-range addresses.

### 2.2.1 Purpose and Function

The Slave Router performs the following critical functions:

- Address Decoding:** Matches request addresses against configured slave address ranges
- Request Routing:** Directs AR/AW/W channels to the selected slave
- Out-of-Range Detection:** Identifies addresses that don't map to any slave
- Error Response Generation:** Creates DECERR responses for invalid addresses
- Default Slave Support:** Routes unmapped addresses to optional default slave

## 2.2.2 Block Diagram



## Slave Router Architecture

Figure 2.2: Slave router architecture showing address decoding, routing matrix, and out-of-range detection for AW and AR channels.

## 2.2.3 Address Decoding Algorithm

### Configuration-Based Address Maps

Each slave is configured with:

```
[[slaves]]
name = "memory"
base_address = 0x4000_0000
size = 0x1000_0000      # 256 MB
default = false          # Not a default slave
```

The router generates address ranges:

```
Start Address = base_address
End Address   = base_address + size - 1
```

## Decoding Priority

When multiple slaves have overlapping address ranges, the router uses **first-match priority**:

Priority Order = Order in configuration file (top to bottom)

Example:

```
Slave 0: 0x0000_0000 - 0x0FFF_FFFF (checked first)
Slave 1: 0x1000_0000 - 0x1FFF_FFFF (checked second)
Slave 2: 0x2000_0000 - 0x2FFF_FFFF (checked third)
```

Address 0x1000\_5000 → Matches Slave 1

## Range Checking Logic

For each address, the router performs:

```
// Simplified address decode logic
logic [NUM_SLAVES-1:0] slave_match;

for (int i = 0; i < NUM_SLAVES; i++) begin
    slave_match[i] = (addr >= SLAVE_BASE[i]) &&
                      (addr <= SLAVE_END[i]);
end

// Priority encode: Select first matching slave
logic [$clog2(NUM_SLAVES)-1:0] slave_select;
always_comb begin
    slave_select = 0;
    for (int i = 0; i < NUM_SLAVES; i++) begin
        if (slave_match[i]) begin
            slave_select = i;
            break; // First match wins
        end
    end
end
end

// Out-of-range detection
logic oor = ~(|slave_match); // No slaves matched
```

## Power-of-Two Optimization

For slaves with power-of-two sizes starting at aligned addresses, simplified decode:

```
// Optimized decode for: base=0x8000_0000, size=0x1000_0000 (256MB)
// Mask off lower bits: Only check upper bits
logic match = (addr[31:28] == 4'h8); // Much simpler than range check
```

The generator automatically detects and applies this optimization.

## 2.2.4 Request Routing

### AR Channel Routing

Read address routing flow: 1. **Decode**: Determine target slave from ARADDR 2. **Check OOR**: If no slave matches, flag for error response 3. **Route**: Send AR transaction to selected slave's arbiter 4. **Track**: Remember routing decision for R channel responses

ARADDR → Decoder → Slave Selection → AR to Slave Arbiter  
↓  
If OOR → Error Response Generator

### AW/W Channel Routing

Write transactions require coordinated routing:

#### 1. AW Phase:

- Decode AWADDR to determine target slave
- Route AW transaction to selected slave's arbiter
- **Store routing decision** for subsequent W beats

#### 2. W Phase:

- **Follow AW routing** (W channel has no address)
- Route all W beats to same slave as AW
- Continue until WLAST = 1

AWADDR → Decoder → Slave Selection → AW to Slave Arbiter  
↓  
Store Routing  
↓  
W[0..N] → W to Same Slave  
(until WLAST)

### Write Data Tracking FSM

State Machine for W Channel Routing:

#### IDLE:

- Wait for AWVALID && AWREADY
- On handshake: Store target slave, goto WRITING

#### WRITING:

- Route W beats to stored slave
- On WVALID && WREADY && WLAST: goto IDLE

### Error Handling:

- If AW was OOR: Discard W beats, generate BRESP error

## 2.2.5 Out-of-Range Handling

### Detection

An address is out-of-range (OOR) if:

$$\forall \text{ slaves}[i]: (\text{address} < \text{base}[i]) \text{ OR } (\text{address} > \text{end}[i])$$

Unless a default slave is configured.

### Error Response Generation

When OOR is detected:

#### Read Transactions (AR → R):

1. Accept ARVALID (assert ARREADY)
2. Do NOT forward to any slave
3. Generate R response internally:
  - RID = Original ARID (with BID preserved)
  - RDATA = 0xBADDCAFE\_DEADBEEF (debug pattern) or all zeros
  - RRESP = 2'b11 (DECERR)
  - RLAST = 1 (for each beat if burst)
4. Latency: Typically 2-3 cycles after AR acceptance

#### Write Transactions (AW/W → B):

1. Accept AWVALID (assert AWREADY)
2. Accept all W beats until WLAST (sink the data)
3. Do NOT forward to any slave
4. Generate B response internally:
  - BID = Original AWID (with BID preserved)
  - BRESP = 2'b11 (DECERR)
5. Latency: Typically 2-3 cycles after WLAST

### Debug Data Patterns

For OOR read responses, configurable data patterns aid debugging:

Option 1: All zeros (default)  
RDATA = 64'h0000\_0000\_0000\_0000

Option 2: Debug signature  
RDATA = 64'hBADD\_CAFE\_DEAD\_BEEF

Option 3: Address echo (LSBs)  
RDATA = {32'hBADD\_ADDR, ARADDR[31:0]}

```
Option 4: Master/Slave ID indicator  
RDATA = {8'hEE, Master_ID[7:0], Slave_ID[7:0], ARADDR[47:0]}
```

## 2.2.6 Default Slave Support

### Configuration

```
[[slaves]]  
name = "error_responder"  
base_address = 0x0          # Ignored for default slave  
size = 0x0                  # Ignored for default slave  
default = true              # Catch-all for unmapped addresses
```

### Behavior

When a default slave is configured:

- Addresses that don't match any specific slave → Routed to default slave
- Default slave typically returns DECERR but with configurable response
- Useful for prototyping (accept all addresses initially)
- Can implement memory-mapped debug register for address capture

**Note:** Only ONE default slave allowed per bridge.

## 2.2.7 Address Aliasing

### Multiple Slaves, Same Address

If configuration has overlapping ranges:

```
[[slaves]]  
name = "fast_cache"  
base_address = 0x8000_0000  
size = 0x1000_0000  
  
[[slaves]]  
name = "slow_memory"  
base_address = 0x8000_0000  # Same base!  
size = 0x4000_0000
```

**Result:** First-match priority applies. All accesses go to fast\_cache. The slow\_memory range 0x9000\_0000-0xBFFF\_FFFF is **unreachable** from this master.

**Warning:** Generator can optionally flag this as error in DRC mode.

### Intentional Aliasing Use-Cases

Legitimate uses of overlapping ranges:

1. **Cache hierarchy:** Small fast cache shadows larger slow memory
2. **Memory remapping:** Different views of same

physical memory

3. **Peripheral mirroring:** Register block appears at multiple addresses

## 2.2.8 Configuration Parameters

### Per-Router Parameters

# Router behavior is defined by slave configurations

```
[[slaves]]  
name = "ddr_memory"  
base_address = 0x8000_0000  
size = 0x4000_0000          # 1 GB  
default = false  
oor_data_pattern = 0xDEAD    # Pattern for OOR reads  
  
[Global Parameters]  
[bridge]  
enable_default_slave = false      # Allow default slave  
strict_address_decode = true       # Flag overlapping ranges as errors  
oor_response_latency = 2           # Cycles for OOR error response (2-4)
```

## 2.2.9 Resource Utilization

### Per-Router Resources (Typical)

#### 4-slave configuration (32-bit address):

Logic Elements: ~150-200 LEs  
Registers: ~50 regs  
Block RAM: 0

#### Breakdown:

- Address comparators (4 slaves × ~30 LEs): ~120 LEs
- Priority encoder: ~20 LEs
- W channel FSM: ~30 regs, ~20 LEs
- OOR error generator: ~20 regs, ~10 LEs

#### 8-slave configuration:

Logic Elements: ~250-350 LEs (scales roughly with slave count)  
Registers: ~60 regs

### Scaling Considerations

Resource usage scales with:

- **Number of slaves:** Linear (each slave adds comparator logic)
- **Address width:** Minimal impact (wider comparators, but same structure)
- **Optimizations:** Power-of-two sizes reduce logic significantly

## 2.2.10 Timing Characteristics

### Decode Latency

**Combinatorial Decode (Default):** - Address → Slave selection: 0 cycles (combinatorial) - Critical path: ARADDR → Slave arbiter request - May limit maximum frequency in large systems

**Registered Decode (Optional):** - Address → Slave selection: 1 cycle (registered) - Adds latency but breaks critical path - Recommended for >8 slaves or >300 MHz operation

### Throughput

- **Maximum:** 1 transaction per cycle per master
- **No blocking:** Router does not stall; arbiters handle conflicts
- **Pipelining:** AR and AW decode in parallel (independent paths)

### Critical Paths

Typical critical paths: - ARADDR → Address comparators → Priority encoder → Arbiter request - w/4 slaves: ~10-15 logic levels (FPGA-dependent) - w/8 slaves: ~12-18 logic levels

**Mitigation:** - Enable registered decode mode (+1 cycle latency) - Use power-of-two slave sizes (simplified compare) - Synthesizer optimization directives

## 2.2.11 Debug and Observability

### Recommended Debug Signals

- Address decode outputs (which slave matched)
- OOR flags (per channel)
- Default slave hit counter
- W channel FSM state
- Routing decision storage (for W tracking)

### Common Issues and Debug

**Symptom:** Reads return all zeros

**Check:** - Is address out-of-range? - Check slave base/size configuration - Verify address decode logic in waveform

**Symptom:** Write data goes to wrong slave

**Check:** - W channel tracking FSM state - Did AW routing complete before W started? - Check for AWVALID/AWREADY handshake

**Symptom:** Unexpected DECERR responses

**Check:** - Address decode configuration - Overlapping slave ranges (wrong priority) - Off-by-one in size calculations

## 2.2.12 Verification Considerations

### Address Decode Tests

Critical test scenarios: 1. **Boundary conditions:** base\_address, base\_address + size - 1 2. **Just out-of-range:** base\_address - 1, base\_address + size 3. **Each slave:**

Verify routing to correct slave 4. **Overlapping ranges:** Verify priority encoding 5.

**Default slave:** Unmapped addresses route correctly

### Write Tracking Tests

W channel FSM testing: 1. **Simple write:** Single AW, single W (WLAST=1) 2. **Burst write:** AW with AWLEN=7, eight W beats 3. **Back-to-back writes:** New AW before previous W completes 4. **Interleaved masters:** Multiple masters writing simultaneously (if supported)

### OOR Error Tests

Test: Read from unmapped address

- Send AR to 0xFFFF\_FFFF (assuming unmapped)
- Verify R response with RRESP = DECERR
- Verify RID matches ARID
- Check response latency

Test: Write to unmapped address

- Send AW to invalid address
- Send W data
- Verify B response with BRESP = DECERR
- Verify BID matches AWID

## 2.2.13 Performance Optimization

### Techniques

#### 1. Registered Decode (High Frequency)

Trade-off: +1 cycle latency for timing closure

Best for: >8 slaves, >300 MHz targets

#### 2. Simplified Address Decode (Power-of-2)

Optimization: Mask-based compare instead of range check

Best for: Slaves with aligned, power-of-2 sizes

Savings: ~50% reduction in comparator logic

### **3. Parallel Decode (Low Logic)**

Implementation: Separate decoders per address bit-field  
Best for: Many non-overlapping slaves  
Savings: Reduces logic depth for priority encoding

### **4. CAM-Based Decode (Many Slaves)**

Trade-off: Uses block RAM for address lookup table  
Best for: >16 slaves, non-contiguous ranges  
Note: Requires RAM resources

## **2.2.14 Future Enhancements**

### **Planned Features**

- **Dynamic Address Remapping:** Runtime-configurable slave ranges
- **Transaction Filtering:** Block certain address ranges per-master
- **Priority Hints:** QoS-based prioritization (beyond first-match)
- **Address Translation:** Offset/mask transformations before slave routing

### **Under Consideration**

- **Multi-region Slaves:** Slave spans multiple non-contiguous ranges
  - **Secure Address Spaces:** Per-master access control lists
  - **Debug Address Capture:** Log invalid addresses to register
- 

**Related Sections:** - Section 2.1: Master Adapter (upstream from router) - Section 2.3: Crossbar Core (downstream arbitration) - Section 2.4: Arbitration (how routed requests compete) - Section 3.3: Slave Port Interface (target of routed requests)

## **2.3 Crossbar Core**

The Crossbar Core is the central interconnect fabric that enables any-to-any connectivity between masters and slaves. It contains the arbitration logic, transaction routing, and response path management that form the heart of the bridge.

### **2.3.1 Purpose and Function**

The Crossbar Core performs the following critical functions:

1. **Full Connectivity:** Provides complete  $N \times M$  master-to-slave interconnect matrix
2. **Independent Arbitration:** Per-slave arbiters allow parallel access to different slaves
3. **Response Routing:** Directs slave responses back to originating masters using Bridge IDs
4. **Transaction Ordering:** Maintains AXI ordering requirements within address dependencies
5. **Backpressure Management:** Handles flow control across multiple concurrent transactions

### 2.3.2 Block Diagram

Crossbar Core Architecture  
Full 5-Channel AXI4 Switching Fabric



## Crossbar Core Architecture

*Figure 2.3: Crossbar core architecture showing complete  $M \times S$  switching fabric with address decode, per-slave arbitration, data path multiplexing, and ID-based response routing.*

### 2.3.3 Connectivity Matrix

#### Full $N \times M$ Crossbar

The bridge implements a **non-blocking crossbar** where:

- N masters can each access different slaves simultaneously
- Conflicts occur only when multiple masters target the same slave
- Independent read and write paths increase parallelism

**Example:** 4 masters, 3 slaves

|    | S0 | S1 | S2 |
|----|----|----|----|
| M0 | ●  | ●  | ●  |
| M1 | ●  | ●  | ●  |
| M2 | ●  | ●  | ●  |
| M3 | ●  | ●  | ●  |

● = Connection available

#### Request Path Multiplexing

For each slave, requests from all masters are multiplexed:

Slave 0 Request Inputs:

- M0 → S0 (AR, AW, W channels)
- M1 → S0 (AR, AW, W channels)
- M2 → S0 (AR, AW, W channels)
- M3 → S0 (AR, AW, W channels)

Arbiter selects one master per channel per cycle

- Grants propagate through MUX
- Selected master's transaction forwarded to slave

#### Response Path Demultiplexing

Responses from slaves are demultiplexed back to masters:

Slave 0 Response Outputs (R, B channels):

- Extract Bridge ID from RID/BID
- Route to corresponding master (M0, M1, M2, or M3)
- Strip Bridge ID before delivering to master adapter

Uses CAM (Content Addressable Memory) for fast ID lookup

## 2.3.4 Request Path Architecture

### Per-Slave Request Arbitration

Each slave has **independent arbiters** for: 1. **AR Channel**: Read address arbitration (all masters competing) 2. **AW/W Channels**: Write address/data arbitration (all masters competing)

This separation allows: - Simultaneous read and write to same slave (if slave supports it) - Independent grant decisions for AR vs. AW channels - Better throughput for mixed read/write workloads

### Request Multiplexers

After arbitration, multiplexers select granted master's signals:

```
// Simplified AR channel MUX for Slave 0
always_comb begin
    case (ar_grant_s0)
        2'b00: begin // Master 0 granted
            s0_arvalid = m0_arvalid;
            s0_araddr = m0_araddr;
            s0_arid = m0_arid; // Includes Bridge ID
            // ... other AR signals
        end
        2'b01: begin // Master 1 granted
            s0_arvalid = m1_arvalid;
            s0_araddr = m1_araddr;
            s0_arid = m1_arid;
            // ... other AR signals
        end
        // ... cases for M2, M3
    endcase
end
```

### Backpressure Propagation

Ready signals flow back from slave through arbiter to granted master:

Slave S0 ARREADY → Arbiter → Granted Master ARREADY  
↓ Other Masters ARREADY = 0

This ensures: - Only granted master sees READY from slave - Non-granted masters see READY = 0 (backpressure) - No combinatorial loops in ready path (registered arbitration)

## 2.3.5 Response Path Architecture

### Bridge ID Extraction

Response routing uses the Bridge ID embedded in transaction IDs:

R Channel:

```
Slave RID = {BID[BID_WIDTH-1:0], Original_ID[ID_WIDTH-1:0]}
Extract: Master_Index = BID
Route R response to Master[Master_Index]
```

B Channel:

```
Slave BID = {BID[BID_WIDTH-1:0], Original_ID[ID_WIDTH-1:0]}
Extract: Master_Index = BID
Route B response to Master[Master_Index]
```

### CAM-Based Routing (Optional)

For large master counts or OOO responses, a CAM tracks outstanding transactions:

CAM Entry Structure:

- Internal ID (with BID)
- Master index
- Transaction type (read/write)
- Timestamp (for timeout detection)

Lookup:

```
Input: RID or BID from slave
Output: Master index for routing
Latency: 1 cycle (registered CAM)
```

**Benefit:** Handles complex scenarios like ID reordering, burst interleaving

### Response Demultiplexers

Based on extracted Bridge ID, responses are routed:

```
// Simplified R channel DEMUX from Slave 0
logic [BID_WIDTH-1:0] master_id;
assign master_id = s0_rid[TOTAL_ID_WIDTH-1:ID_WIDTH]; // Extract BID

always_comb begin
    // Default: No masters receive response
    m0_rvalid = 1'b0;
    m1_rvalid = 1'b0;
    m2_rvalid = 1'b0;
    m3_rvalid = 1'b0;
```

```

// Route to indicated master
case (master_id)
    2'b00: begin
        m0_rvalid = s0_rvalid;
        m0_rdata  = s0_rdata;
        m0_rid    = s0_rid[ID_WIDTH-1:0]; // Strip BID
        // ... other R signals
    end
    2'b01: begin
        m1_rvalid = s0_rvalid;
        // ... route to M1
    end
    // ... M2, M3
endcase
end

```

## Multi-Slave Response Merging

When multiple slaves can respond simultaneously, arbitration ensures:

- Only one slave's response delivered per master per cycle
- Fair arbitration if multiple slaves have responses for same master
- No response loss (responses queued until master ready)

### 2.3.6 AXI Ordering Requirements

The crossbar maintains AXI ordering rules:

#### Read-After-Write (RAW) Ordering

**Rule:** Read from address must see data from earlier write to same address

**Crossbar Behavior:**

- Ordering enforced at slave level (slave handles RAW within itself)
- Crossbar does NOT reorder transactions to same slave from same master
- Different masters to same slave: No ordering guaranteed (slave must handle)

#### Write-After-Write (WAW) Ordering

**Rule:** Writes to overlapping addresses must complete in issue order

**Crossbar Behavior:**

- Same master to same slave: Order preserved by arbiter (FIFO grant queue)
- Different masters to same slave: Slave responsible for write ordering

## Out-of-Order (OOO) Completion

**Allowed:** - Read responses can return out-of-order (different RIDs) - Reads to different slaves can complete in any order - Writes to different slaves can complete in any order

**Crossbar Support:** - Bridge ID tracking enables OOO response routing - Master sees responses in slave-determined order - Multi-master OOO requires careful slave design

## 2.3.7 Resource Utilization

### Crossbar Core Resources

**4 masters × 3 slaves configuration (64-bit data, 32-bit addr):**

Logic Elements: ~2000-3500 LEs  
Registers: ~800-1200 regs  
Block RAM: 0-4 KB (if CAM used)

Breakdown per slave:

|                            |                     |
|----------------------------|---------------------|
| - Arbiter (4 masters, RR): | ~200 LEs, ~50 regs  |
| - Request MUX (AR/AW/W):   | ~400 LEs, ~100 regs |
| - Response DEMUX (R/B):    | ~300 LEs, ~80 regs  |
| - Control FSMs:            | ~100 LEs, ~50 regs  |

Total for 3 slaves:  $3 \times 1000$  LEs = ~3000 LEs

Plus routing overhead: +500 LEs

### Scaling with Masters and Slaves

**Linear scaling:** - Adding 1 master: +~500 LEs per slave (new arbiter input) -  
Adding 1 slave: +~1000 LEs (complete new slave port)

**Example:** 8 masters × 6 slaves

Estimated: ~12,000 LEs, ~3000 regs  
Block RAM: ~8 KB (for CAM if enabled)

### Optimization Techniques

- Read-Only/Write-Only Masters:** Reduces arbiter complexity by 40-50%
- Power-of-Two Master Count:** Simplifies BID width and routing logic
- Pipeline Stages:** Trading latency for frequency (deeper pipelines)

## 2.3.8 Timing Characteristics

### Latency

**Request Path** (Master → Slave): - Arbiter decision: 1 cycle (registered) - MUX selection: 0 cycles (combinatorial) or +1 cycle (registered) - **Total:** 1-2 cycles through crossbar

**Response Path** (Slave → Master): - BID extraction: 0 cycles (combinatorial) - DEMUX routing: 0 cycles (combinatorial) or +1 cycle (registered) - **Total:** 0-1 cycles through crossbar

**End-to-End** (Master adapter → Slave adapter): - Adapters: 2 cycles (skid buffers) - Router: 0-1 cycles (decode) - Crossbar: 1-2 cycles (arbitration + MUX) - **Total:** 3-5 cycles minimum latency

### Throughput

**Best Case** (no conflicts): - Each master to different slave: N transactions/cycle ( $N =$  master count) - Maximum aggregate bandwidth:  $N \times \text{data\_width}$  bits/cycle

**Arbitration Limits:** - Multiple masters to one slave: 1 transaction/cycle to that slave - Other slaves remain available for parallel access

**Burst Performance:** - Once granted, bursts flow at 1 beat/cycle - Grant held until burst completes (RLAST or WLAST)

## 2.3.9 Critical Paths

### Common Critical Paths

1. **Arbiter Request → Grant:**
  - All masters' VALID signals → Arbiter logic → Grant decision
  - Depth: ~5-8 logic levels for 4 masters
2. **Grant → Ready Backpressure:**
  - Slave READY → Arbiter → Selected master READY
  - Depth: ~4-6 logic levels
3. **Response Demux:**
  - Slave RDATA/RID → BID extraction → Master select → Master RDATA
  - Depth: ~6-10 logic levels for 64-bit data

### Mitigation Strategies

1. **Registered MUX/DEMUX:** +1 cycle latency, breaks paths

2. **Pipelined Arbitration:** Multi-cycle arbiter for >8 masters
3. **Hierarchical Crossbar:** For >16 masters, use tree structure

## 2.3.10 Configuration Parameters

### Crossbar Parameters (from TOML)

#### [bridge]

```
num_masters = 4
num_slaves = 3
internal_data_width = 64
arbiter_type = "round_robin"          # "round_robin", "fixed_priority",
"weighted"
registered_mux = false               # true = +1 cycle, better timing
registered_demux = false             # true = +1 cycle, better timing
enable_cam = false                   # true = CAM-based routing
cam_depth = 16                      # Outstanding transactions tracked
```

## 2.3.11 Debug and Observability

### Recommended Debug Signals

Per Slave:

- Arbiter grants (which master granted)
- Arbiter requests (which masters requesting)
- Request MUX outputs (VALID, READY, ADDR, ID)
- Response DEMUX inputs (VALID, READY, DATA, ID)

Global:

- Active transactions count
- Stall counters (arbiter conflicts)
- BID extraction errors
- CAM hit/miss (if enabled)

### Performance Counters

Useful metrics for profiling:

- Transactions per slave (read, write separate)
- Arbiter conflict rate (requests denied due to grant contention)
- Average grant latency
- Utilization per master (% cycles active)
- Utilization per slave (% cycles busy)

## 2.3.12 Common Issues and Debug

**Symptom:** Master hangs with VALID=1, READY=0

**Check:** - Is another master holding grant to this slave? - Is arbiter logic functioning (check grant signals)? - Is slave responding with READY?

**Symptom:** Response goes to wrong master

**Check:** - Bridge ID values (verify correct BID per master) - CAM contents (if used)  
- BID extraction logic (check bit positions)

**Symptom:** Throughput lower than expected

**Check:** - Arbiter conflicts (multiple masters to same slave?) - Pipeline depth (excessive latency reducing effective bandwidth?) - Burst efficiency (are bursts being granted properly?)

## 2.3.13 Verification Considerations

### Functional Tests

1. **Single Master to Each Slave:** Verify basic connectivity
2. **All Masters to One Slave:** Stress arbiter fairness
3. **All Masters to All Slaves:** Maximum parallelism test
4. **OOO Responses:** Issue transactions with different latencies
5. **Burst Interleaving:** Multiple masters with bursts to same slave

### Corner Cases

- Back-to-back grants (no idle cycles)
- Grant held for maximum burst length (256 beats)
- Rapid master switching (each gets 1 transaction then switches)
- Response while request in progress (pipelining)
- All masters requesting same slave simultaneously

### Protocol Compliance

- AXI4 protocol checker at each master/slave interface
- Verify no READY → VALID dependencies (AXI violation)
- Check ID preservation through crossbar (modulo Bridge ID)
- Verify LAST signal handling

## 2.3.14 Future Enhancements

### Planned Features

- **Weighted Round-Robin:** QoS support with configurable priorities
- **Slave-Side Arbitration Policies:** Per-slave arbiter configuration
- **Grant Prediction:** Speculative grant for lower latency
- **Congestion Control:** Throttling to prevent hotspots

### Under Consideration

- **Partial Crossbar:** Configurable master-to-slave connectivity (not full mesh)

- **Multi-Tier Hierarchy:** For 32+ masters/slaves
  - **Virtual Channels:** Separate channels for different traffic classes
  - **Register Slicing:** Automatic pipeline insertion for timing
- 

**Related Sections:** - Section 2.1: Master Adapter (request sources) - Section 2.2: Slave Router (address decode before arbitration) - Section 2.4: Arbitration (detailed arbiter algorithms) - Section 2.5: ID Management (CAM structures, Bridge ID tracking) - Section 3.1: Top-Level Integration (crossbar instantiation)

## 2.4 Arbitration

Arbitration is the mechanism by which the bridge decides which master gains access to a slave when multiple masters simultaneously request the same slave. Each slave has dedicated arbiters for its read and write channels, enabling fair and efficient resource allocation.

### 2.4.1 Purpose and Function

The arbiter performs the following critical functions:

1. **Conflict Resolution:** Selects one master when multiple masters request same slave
2. **Fairness:** Ensures all masters eventually get access (starvation-free)
3. **Throughput Optimization:** Minimizes idle cycles and maximizes utilization
4. **Grant Management:** Maintains grants for burst transactions
5. **Priority Enforcement:** Supports priority-based access policies (optional)

### 2.4.2 Arbitration Architecture

#### Per-Slave, Per-Channel Arbiters

Each slave has **two independent arbiters**:

Slave 0:

- AR Arbiter (Read Address Channel)
  - \* Inputs: M0\_ARVALID, M1\_ARVALID, M2\_ARVALID, M3\_ARVALID
  - \* Output: Grant[1:0] → Selects one master
- AW Arbiter (Write Address Channel)
  - \* Inputs: M0\_AWVALID, M1\_AWVALID, M2\_AWVALID, M3\_AWVALID

- \* Output: Grant[1:0] → Selects one master
- \* Coordinates with W channel data flow

**Independence:** AR and AW arbiters operate independently, allowing simultaneous read and write to same slave (if slave supports full-duplex operation).

### Block Diagram



### 2.4.3 Round-Robin Arbitration

#### Algorithm

The **Round-Robin (RR)** arbiter cycles through masters in order, giving each master one opportunity to access the slave:

Priority Order (rotates after each grant):

Cycle 0: M0 > M1 > M2 > M3

Cycle 1: M1 > M2 > M3 > M0 (M0 was granted, now lowest priority)

Cycle 2: M2 > M3 > M0 > M1 (M1 was granted)  
 Cycle 3: M3 > M0 > M1 > M2 (M2 was granted)  
 Cycle 4: M0 > M1 > M2 > M3 (M3 was granted, cycle repeats)

## Implementation

```
// Simplified Round-Robin arbiter (4 masters)
logic [1:0] last_grant;           // Which master was granted last
logic [3:0] request;             // Current requests (VALID signals)
logic [1:0] grant;               // Grant decision

always_ff @(posedge clk) begin
    if (!rst_n) begin
        last_grant <= 2'b00;
    end else if (|request && slave_ready) begin
        // Update last_grant when transaction completes
        if (grant_accepted) begin
            last_grant <= grant;
        end
    end
end

always_comb begin
    // Default: No grant
    grant = 2'b00;

    // Priority encode starting after last grant
    case (last_grant)
        2'b00: begin // Last grant to M0, try M1, M2, M3, M0
            if (request[1]) grant = 2'b01;
            else if (request[2]) grant = 2'b10;
            else if (request[3]) grant = 2'b11;
            else if (request[0]) grant = 2'b00;
        end
        2'b01: begin // Last grant to M1, try M2, M3, M0, M1
            if (request[2]) grant = 2'b10;
            else if (request[3]) grant = 2'b11;
            else if (request[0]) grant = 2'b00;
            else if (request[1]) grant = 2'b01;
        end
        // ... similar for M2, M3
    endcase
end
```

## Characteristics

**Fairness:** Excellent - Each master gets equal access over time

**Latency:** Bounded - Maximum wait =  $(N-1) \times \text{transaction\_time}$

**Throughput:** Optimal - No idle cycles when requests pending

**Starvation:** None - Every master eventually gets grant

**Best Use Cases:** - Peers with similar priority - General-purpose interconnects - Default choice for most bridges

## 2.4.4 Fixed-Priority Arbitration

### Algorithm

The **Fixed-Priority** arbiter always grants to the highest-priority requesting master:

Priority Order (never changes):  
M0 > M1 > M2 > M3 (M0 always highest priority)

Grant Decision:

```
if (M0_VALID) → Grant = M0
else if (M1_VALID) → Grant = M1
else if (M2_VALID) → Grant = M2
else if (M3_VALID) → Grant = M3
```

### Implementation

```
// Fixed-priority arbiter (4 masters)
logic [3:0] request;
logic [1:0] grant;

always_comb begin
    // Priority encoder: M0 highest, M3 lowest
    if (request[0])      grant = 2'b00; // M0 highest priority
    else if (request[1]) grant = 2'b01;
    else if (request[2]) grant = 2'b10;
    else if (request[3]) grant = 2'b11; // M3 lowest priority
    else                  grant = 2'b00; // Default (no requests)
end
```

### Characteristics

**Fairness:** Poor - Low-priority masters can starve

**Latency:** Variable - High-priority: minimal, Low-priority: unbounded

**Throughput:** Optimal - No idle cycles when requests pending

**Starvation:** Possible for low-priority masters

**Best Use Cases:** - Real-time systems with priority requirements - CPU (high) vs. DMA (low) scenarios - Time-critical masters need guaranteed access

### Starvation Prevention

Optional enhancement: **Priority aging**

- Track cycles since last grant per master
- Temporarily boost priority of starved masters
- Reset age counter after grant

Example:

M0 age = 100 cycles → Boost M0 above normal priority  
M1 age = 5 cycles → Normal priority

## 2.4.5 Weighted Arbitration

### Algorithm

The **Weighted** arbiter assigns different access rates to masters based on weights:

Configuration:

M0 weight = 4 (50% of bandwidth)  
M1 weight = 2 (25% of bandwidth)  
M2 weight = 1 (12.5% of bandwidth)  
M3 weight = 1 (12.5% of bandwidth)

Grant Sequence (repeating pattern):

M0, M0, M0, M0, M1, M1, M2, M3 (8 cycles total)  
M0 gets 4/8, M1 gets 2/8, M2 gets 1/8, M3 gets 1/8

### Implementation

```
// Weighted arbiter using credit counter
logic [7:0] credits [0:3]; // Credit counters per master
logic [1:0] grant;

always_ff @(posedge clk) begin
    if (!rst_n) begin
        credits[0] <= 8'd4; // M0 weight
        credits[1] <= 8'd2; // M1 weight
        credits[2] <= 8'd1; // M2 weight
        credits[3] <= 8'd1; // M3 weight
    end else if (grant_accepted) begin
        // Decrement granted master's credits
        credits[grant] <= credits[grant] - 1;

        // Reload when all credits exhausted
        if (all_credits_zero) begin
            credits[0] <= 8'd4;
            credits[1] <= 8'd2;
            credits[2] <= 8'd1;
            credits[3] <= 8'd1;
        end
    end
end
end
```

```

always_comb begin
    // Grant to master with highest credits and active request
    grant = select_highest_credit_with_request(credits, request);
end

```

## Characteristics

**Fairness:** Configurable - Proportional to weights

**Latency:** Bounded - Depends on weight ratio

**Throughput:** Near-optimal - Small overhead for credit tracking

**Starvation:** None - All masters get share per weight

**Best Use Cases:** - QoS requirements (guaranteed bandwidth) - Mixed workload (video + CPU + DMA) - Service-level agreements

## 2.4.6 Burst Handling

### Grant Locking

Once granted, a master **holds the grant** until its burst completes:

AR Channel Burst:

1. Master asserts ARVALID with ARLEN = 7 (8-beat burst)
2. Arbiter grants to this master
3. Grant held until slave returns RLAST
4. Other masters blocked during this time

AW/W Channel Burst:

1. Master asserts AWVALID with AWLEN = 15 (16-beat burst)
2. Arbiter grants to this master
3. Grant held until master asserts WLAST
4. Other masters blocked during W data transfer

### Implementation

```

// Burst grant locking
logic grant_locked;
logic [1:0] locked_master;

always_ff @(posedge clk) begin
    if (!rst_n) begin
        grant_locked <= 1'b0;
    end else begin
        if (grant_accepted && !last_beat) begin
            // Lock grant for burst
            grant_locked <= 1'b1;
            locked_master <= grant;
        end else if (last_beat) begin
            // Release grant when burst completes
        end
    end

```

```

        grant_locked <= 1'b0;
    end
end
assign grant = grant_locked ? locked_master : arbiter_decision;

```

## Fairness Considerations

Long bursts can block other masters:  
- Maximum AXI burst: 256 beats  
- At 1 cycle/beat: Up to 256 cycles blocked  
- Impacts latency for other masters

**Mitigation:**  
- Limit burst lengths in master configuration  
- Use burst interleaving (complex, not implemented in Phase 1)  
- Monitor arbiter stall counters

## 2.4.7 Resource Utilization

### Per-Arbiter Resources

#### Round-Robin (4 masters):

Logic Elements: ~150 LEs  
Registers: ~40 regs

##### Breakdown:

- Priority encoder: ~80 LEs
- Last grant tracking: ~10 regs
- Request capture: ~20 regs
- Grant FSM: ~40 LEs, ~10 regs

#### Fixed-Priority (4 masters):

Logic Elements: ~80 LEs  
Registers: ~20 regs

Simpler than RR (no rotation logic)

#### Weighted (4 masters):

Logic Elements: ~200 LEs  
Registers: ~60 regs

Additional credit counters and comparison logic

## Scaling with Master Count

Resource usage scales approximately **O(N<sup>2</sup>)** where N = master count:

2 masters: ~50 LEs  
4 masters: ~150 LEs  
8 masters: ~400 LEs  
16 masters: ~1200 LEs

The  $N^2$  scaling comes from:  
- Priority encoder: Compares all pairs of masters  
- Grant MUX: N-to-1 multiplexing increases with N

## 2.4.8 Timing Characteristics

### Arbitration Latency

**Single-Cycle Arbitration** (default):

Request → Grant Decision: 1 cycle  
Grant Decision → MUX Output: 0 cycles (combinatorial)  
Total: 1 cycle

**Multi-Cycle Arbitration** (high frequency):

Request → Grant Decision: 2-3 cycles (pipelined)  
Improves timing at cost of latency

### Critical Paths

Typical critical paths in arbiter:  
1. **Request collection**: All master VALID signals → Arbiter input  
2. **Priority encode**: Compare all requests → Grant decision  
3. **Grant propagate**: Grant → MUX select → Slave interface

**Path Depth**: - 4 masters: ~6-8 logic levels - 8 masters: ~8-12 logic levels - 16 masters: ~12-16 logic levels

## 2.4.9 Configuration Parameters

### Arbiter Configuration (TOML)

```
[bridge]
arbiter_type = "round_robin"      # "round_robin", "fixed_priority",
"weighted"

[bridge.arbitration]
pipeline_stages = 1                # 1-3 (more = better timing, higher
latency)
enable_priority_aging = false       # Prevent starvation in fixed-
priority
aging_threshold = 1000              # Cycles before priority boost

# Weighted arbitration weights (only if arbiter_type = "weighted")
[[bridge.arbitration.weights]]
```

```

master = "cpu"                                # Relative weight (1-255)
weight = 4

[[bridge.arbitration.weights]]
master = "dma"
weight = 2

[[bridge.arbitration.weights]]
master = "periph"
weight = 1

```

## 2.4.10 Debug and Observability

### Recommended Debug Signals

Per Arbiter:

- Request vector (all masters requesting)
- Grant decision (which master granted)
- Grant locked status (burst in progress)
- Last grant (for RR rotation tracking)

Performance Counters:

- Grants per master (utilization)
- Denied requests per master (contention)
- Average grant latency per master
- Arbiter conflict cycles (multiple requests, one grant)

### Common Issues and Debug

**Symptom:** Master never gets grant (starvation)

**Check:** - Arbiter type (fixed-priority can starve low-priority masters) - Other masters continuously requesting? - Enable priority aging if using fixed-priority

**Symptom:** Unfair access (one master dominates)

**Check:** - Arbiter type (should be round-robin for fairness) - Burst lengths (long bursts block others) - Request patterns (one master requesting more frequently)

**Symptom:** Low throughput despite requests

**Check:** - Arbiter conflicts (too many masters to one slave) - Pipeline depth (excessive arbitration latency) - Burst efficiency (short bursts waste cycles)

## 2.4.11 Verification Considerations

### Test Scenarios

1. Fair Access Test (Round-Robin):

- All masters continuously request same slave
- Verify each master gets equal grants over 1000 cycles
- Expected: Each of 4 masters gets ~250 grants

### 2. Priority Test (Fixed-Priority):

- High-priority master requests intermittently
- Low-priority masters request continuously
- Verify high-priority always wins when requesting

### 3. Burst Locking Test:

- Master issues 16-beat burst
- Other masters request during burst
- Verify grant held until LAST
- Verify other masters blocked during burst

### 4. Weighted Bandwidth Test:

- Configure M0=4, M1=2, M2=1, M3=1
- All masters request continuously for 10000 cycles
- Verify grants proportional: M0≈50%, M1≈25%, M2≈12.5%, M3≈12.5%

## Corner Cases

- Single master requesting (trivial grant)
- All masters requesting simultaneously (worst-case arbitration)
- Grant released and new grant same cycle (back-to-back)
- Master drops request after arbiter latency (stale grant)
- Maximum length burst (256 beats blocking)

## 2.4.12 Performance Impact

### Arbiter Choice Impact

**Round-Robin:** - **Pros:** Fair, starvation-free, predictable latency - **Cons:** Cannot prioritize time-critical masters - **Use:** General-purpose, peer masters

**Fixed-Priority:** - **Pros:** Guaranteed low latency for high-priority masters - **Cons:** Low-priority can starve, unpredictable for low-priority - **Use:** Real-time, priority-sensitive systems

**Weighted:** - **Pros:** Configurable bandwidth allocation, QoS support - **Cons:** More complex, small overhead - **Use:** Mixed workload, SLA requirements

### Arbiter Efficiency

Assuming all masters request same slave continuously:

**Best Case:** 100% efficiency (one master)

- No arbitration conflicts
- Zero idle cycles

Typical Case: 25-50% per-master efficiency (4 masters, round-robin)

- Each master gets ~25% of time grants
- Load balancing to other slaves improves

Worst Case: Heavy contention

- 4 masters to 1 slave: 25% efficiency per master
- Solution: Add slaves or use more slaves in parallel

## 2.4.13 Future Enhancements

### Planned Features

- **Dynamic Weight Adjustment:** Runtime-configurable weights via registers
- **QoS Classes:** Multiple priority levels with configurable policies
- **Deadline-Based Arbitration:** Grant based on transaction deadlines
- **Predictive Arbitration:** Speculative grants to reduce latency

### Under Consideration

- **Multi-Level Arbitration:** Hierarchical for >16 masters
- **Token Bucket:** Rate limiting per master
- **History-Based:** Learn access patterns and optimize grants
- **ECC Protection:** For arbitration state (safety-critical systems)

---

**Related Sections:** - Section 2.3: Crossbar Core (arbiter integration) - Section 2.1: Master Adapter (request sources) - Section 2.5: ID Management (transaction tracking during grants) - Chapter 6: Performance (arbiter impact on throughput)

## 2.5 ID Management

ID Management is the system by which the bridge tracks outstanding transactions and routes responses back to the originating master. This is accomplished through Bridge ID injection, Content Addressable Memory (CAM) structures, and ID translation logic.

### 2.5.1 Purpose and Function

The ID management system performs the following critical functions:

1. **Transaction Tracking:** Maintains association between requests and originating masters

2. **Response Routing:** Directs responses to correct master using embedded IDs
3. **Out-of-Order Support:** Handles responses returning in different order than issued
4. **ID Space Isolation:** Prevents ID conflicts between multiple masters
5. **Burst Management:** Tracks multi-beat bursts through the bridge

## 2.5.2 Bridge ID Concept

### Problem Statement

Without ID management, the bridge cannot determine which master originated a transaction:

Scenario: Two masters with overlapping IDs

Master 0: Issues read with ARID = 4'h5

Master 1: Issues read with ARID = 4'h5 (same ID!)

When slave responds with RID = 4'h5, which master gets the response?  
 → Ambiguous! Need additional information.

### Solution: Bridge ID Injection

The bridge adds a **Bridge ID (BID)** to each transaction ID:

Internal ID = {Bridge\_ID, Original\_ID}

Master 0, ARID = 4'h5:

Internal ARID = {2'b00, 4'h5} = 6'b00\_0101

Master 1, ARID = 4'h5:

Internal ARID = {2'b01, 4'h5} = 6'b01\_0101

Now responses with RID = 6'b00\_0101 → Route to Master 0

RID = 6'b01\_0101 → Route to Master 1

### 2.5.3 Block Diagram

## ID Management Architecture CAM/FIFO Transaction Tracking



## *ID Management Architecture*

*Figure 2.5: ID management architecture showing CAM and FIFO modes for transaction tracking and response routing.*

### 2.5.4 Bridge ID Width Calculation

#### Formula

$$\text{BID\_WIDTH} = \text{clog2}(\text{NUM\_MASTERS})$$

Where  $\text{clog2}(x) = \text{ceiling}(\log_2(x))$

#### Examples

|             |                    |          |                       |
|-------------|--------------------|----------|-----------------------|
| 2 masters:  | $\text{clog2}(2)$  | = 1 bit  | (BID: 0, 1)           |
| 3 masters:  | $\text{clog2}(3)$  | = 2 bits | (BID: 00, 01, 10)     |
| 4 masters:  | $\text{clog2}(4)$  | = 2 bits | (BID: 00, 01, 10, 11) |
| 5 masters:  | $\text{clog2}(5)$  | = 3 bits | (BID: 000-100)        |
| 8 masters:  | $\text{clog2}(8)$  | = 3 bits | (BID: 000-111)        |
| 16 masters: | $\text{clog2}(16)$ | = 4 bits | (BID: 0000-1111)      |

#### ID Width Growth

Configuration: 4 masters, external ARID\_WIDTH = 4

$$\begin{aligned}\text{Internal ARID\_WIDTH} &= \text{BID\_WIDTH} + \text{External ARID\_WIDTH} \\ &= 2 + 4 \\ &= 6 \text{ bits}\end{aligned}$$

Slave sees 6-bit IDs

Master sees 4-bit IDs

Bridge translates between them

### 2.5.5 ID Injection (Request Path)

#### Read Address Channel (AR)

```
// ID injection at master adapter
logic [BID_WIDTH-1:0] master_bid;           // Constant per master
logic [ARID_WIDTH-1:0] external_arid;        // From master
logic [TOTAL_ARID_WIDTH-1:0] internal_arid;  // To crossbar

assign master_bid = MASTER_INDEX; // M0=0, M1=1, M2=2, M3=3
assign internal_arid = {master_bid, external_arid};

// Example: Master 2, ARID = 4'h7
// internal_arid = {2'b10, 4'h7} = 6'b10_0111
```

```

Write Address Channel (AW)
// ID injection for write transactions
logic [BID_WIDTH-1:0] master_bid;
logic [AWID_WIDTH-1:0] external_awid;
logic [TOTAL_AWID_WIDTH-1:0] internal_awid;

assign internal_awid = {master_bid, external_awid};

// Note: AWID and ARID widths can differ per master

```

## Injection Timing

**Combinatorial** (default): - ID concatenation done in same cycle as VALID assertion - Zero latency overhead - May contribute to critical path in high-frequency designs

**Registered** (optional): - Add pipeline register after ID injection - +1 cycle latency - Breaks critical path

## 2.5.6 ID Extraction (Response Path)

### Read Data Channel (R)

```

// ID extraction at crossbar response router
logic [TOTAL_RID_WIDTH-1:0] internal_rid;      // From slave
logic [BID_WIDTH-1:0] extracted_bid;           // Upper bits
logic [RID_WIDTH-1:0] external_rid;            // Lower bits

// Extract Bridge ID
assign extracted_bid = internal_rid[TOTAL_RID_WIDTH-1:RID_WIDTH];

// Strip Bridge ID for master
assign external_rid = internal_rid[RID_WIDTH-1:0];

// Route based on BID
always_comb begin
    case (extracted_bid)
        2'b00: master0_rvalid = slave_rvalid;
        2'b01: master1_rvalid = slave_rvalid;
        2'b10: master2_rvalid = slave_rvalid;
        2'b11: master3_rvalid = slave_rvalid;
    endcase
end

```

### Write Response Channel (B)

```

// Similar extraction for write responses
logic [TOTAL_BID_WIDTH-1:0] internal_bid;
logic [BID_WIDTH-1:0] extracted_bid;

```

```

logic [BID_WIDTH-1:0] external_bid;

assign extracted_bid = internal_bid[TOTAL_BID_WIDTH-1:BID_WIDTH];
assign external_bid = internal_bid[BID_WIDTH-1:0];

// Route to originating master

```

## 2.5.7 Content Addressable Memory (CAM)

### When to Use CAM

**Simple Configurations** (No CAM needed): - Single master (BID unnecessary) - Direct ID mapping suffices - Lower resource usage

**Complex Configurations** (CAM beneficial): - Many masters (>8) - Out-of-order responses common - Multiple outstanding transactions per master - ID reordering within slave

### CAM Structure

Entry Format:

| Internal ID<br>(6-12 bits) | Master Index<br>(2-4 bits) | Transaction<br>Type<br>(R/W) | Timestamp<br>(counter) | Valid<br>(1b) |
|----------------------------|----------------------------|------------------------------|------------------------|---------------|
| 10b                        | 3b                         | 1b                           | 16b                    | 1b            |

Total per entry: ~30 bits  
 CAM depth: 16-64 entries typical

### CAM Operations

#### Allocation (Request Path):

1. Master issues AR/AW transaction
2. Find free CAM entry (Valid = 0)
3. Write entry:
  - Internal ID = {BID, External\_ID}
  - Master Index = BID
  - Transaction Type = Read or Write
  - Timestamp = Current cycle count
  - Valid = 1
4. Forward transaction to slave

#### Lookup (Response Path):

1. Slave returns R/B response with Internal ID
2. CAM lookup: Search for matching Internal ID
3. Extract Master Index from matching entry

4. Route response to Master[Master Index]
5. If LAST: Deallocate entry (Valid = 0)

**Associative Search:** - All entries checked in parallel - 1-cycle lookup (registered CAM) - Match found → Returns master index - No match → Error condition (protocol violation)

## 2.5.8 CAM Implementation

### Parallel CAM (Fast, Resource-Intensive)

```
// Parallel CAM structure (16 entries)
typedef struct packed {
    logic valid;
    logic [TOTAL_ID_WIDTH-1:0] internal_id;
    logic [BID_WIDTH-1:0] master_idx;
    logic txn_type; // 0=read, 1=write
    logic [15:0] timestamp;
} cam_entry_t;

cam_entry_t cam [0:15];

// Parallel search
logic [15:0] match;
logic [3:0] match_idx;

for (genvar i = 0; i < 16; i++) begin
    assign match[i] = cam[i].valid &&
                      (cam[i].internal_id == response_id);
end

// Priority encode first match
assign match_idx = find_first_set(match);
assign routed_master = cam[match_idx].master_idx;
```

### Sequential CAM (Slow, Resource-Efficient)

```
// Sequential CAM search (multi-cycle)
logic [3:0] search_idx;
logic searching;

always_ff @(posedge clk) begin
    if (search_start) begin
        search_idx <= 0;
        searching <= 1'b1;
    end else if (searching) begin
        if (cam[search_idx].valid &&
            cam[search_idx].internal_id == response_id) begin
            // Match found
            routed_master <= cam[search_idx].master_idx;
        end
    end
end
```

```

        searching <= 1'b0;
    end else if (search_idx == 15) begin
        // No match found (error)
        searching <= 1'b0;
        error <= 1'b1;
    end else begin
        search_idx <= search_idx + 1;
    end
end
end

```

**Trade-off:** - Parallel: 1-cycle, ~2000 LEs for 16 entries - Sequential: 16 cycles, ~200 LEs for 16 entries

## 2.5.9 Outstanding Transaction Limits

### Configuration

```

[bridge]
enable_cam = true
cam_depth = 16          # Max outstanding transactions

[[masters]]
name = "cpu"
max_outstanding_reads = 8  # Per-master limit
max_outstanding_writes = 8

```

### Enforcement

When CAM full:

1. Master issues new request
2. Check CAM for free entry
3. If full:
  - ARREADY/AWREADY = 0 (backpressure)
  - Wait for response to free entry
4. When entry freed (RLAST/BVALID):
  - Accept new request

### Sizing Guidelines

CAM Depth =  $\Sigma(\text{max\_outstanding per master}) + \text{Safety margin}$

Example: 4 masters, 4 outstanding each

Minimum CAM depth =  $4 \times 4 = 16$  entries

Recommended depth = 20 entries (25% margin)

## 2.5.10 Resource Utilization

### ID Injection/Extraction Only

**Per Master** (no CAM):

Logic Elements: ~20-50 LEs  
Registers: ~10 regs

Simple bit concatenation and extraction  
Minimal overhead

### With CAM

**CAM Resources** (16 entries, 6-bit IDs):

Parallel CAM:

Logic Elements: ~2000 LEs  
Registers: ~500 regs  
Block RAM: 0 (distributed)

BRAM-Based CAM:

Logic Elements: ~500 LEs  
Registers: ~100 regs  
Block RAM: 1-2 KB

Sequential CAM:

Logic Elements: ~200 LEs  
Registers: ~100 regs  
Block RAM: 0

### Scaling

| CAM Depth  | Parallel  | BRAM      | Sequential |
|------------|-----------|-----------|------------|
| 16 entries | ~2000 LEs | ~500 LEs  | ~200 LEs   |
| 32 entries | ~4000 LEs | ~800 LEs  | ~250 LEs   |
| 64 entries | ~8000 LEs | ~1200 LEs | ~300 LEs   |

## 2.5.11 Timing Characteristics

### ID Injection Latency

**Combinatorial** (default): - 0 cycles overhead - Part of adapter pipeline stage

**Registered**: - +1 cycle latency - Breaks timing path

### ID Extraction/Routing Latency

**Direct Decode** (no CAM): - 0 cycles (combinatorial BID extraction)

**Parallel CAM:** - 1 cycle (registered search)

**Sequential CAM:** - 1-N cycles (where N = CAM depth) - Average: N/2 cycles

### End-to-End Impact

Configuration: No CAM, combinatorial ID management

Request: Master → +0 cycles → Crossbar

Response: Slave → +0 cycles → Master

Total: 0 cycles overhead

Configuration: Parallel CAM, registered

Request: Master → +1 cycle (allocation) → Crossbar

Response: Slave → +1 cycle (lookup) → Master

Total: 2 cycles overhead

## 2.5.12 Configuration Parameters

### ID Management Configuration (TOML)

#### [bridge]

```
num_masters = 4
enable_cam = false          # Use CAM for ID tracking
cam_type = "parallel"       # "parallel", "bram", "sequential"
cam_depth = 16               # Outstanding transaction capacity
```

#### [bridge.id\_management]

```
registered_injection = false    # Register ID injection (+1 cycle)
registered_extraction = false   # Register ID extraction (+1 cycle)
enable_timeout = true          # Detect hung transactions
timeout_cycles = 10000         # Cycles before timeout error
```

#### [[masters]]

```
name = "cpu"
arid_width = 4                  # External ID width
awid_width = 4
max_outstanding_reads = 8        # CAM allocation limit
max_outstanding_writes = 8
```

## 2.5.13 Debug and Observability

### Recommended Debug Signals

#### ID Injection:

- Master BID assignments (constant per master)
- External IDs (from masters)
- Internal IDs (after injection)

#### ID Extraction:

- Internal IDs (from slaves)
- Extracted BIDs

- External IDs (to masters)

CAM (if enabled):

- CAM occupancy (number of valid entries)
- Allocation/deallocation events
- Search hits/misses
- Timeout events

### Performance Counters

- Total transactions tracked
- CAM hit rate
- CAM miss rate (should be 0)
- Average CAM occupancy
- Peak CAM occupancy
- Timeout errors
- ID width overhead (bits added per transaction)

## 2.5.14 Common Issues and Debug

**Symptom:** Response goes to wrong master

**Check:** - BID assignment (verify each master has unique BID) - ID width calculation (BID\_WIDTH correct?) - Bit slicing in extraction logic - CAM contents (if used)

**Symptom:** CAM full, transactions stalling

**Check:** - CAM depth vs. outstanding transaction count - Are responses being received? (slave responsive?) - Timeout threshold (too short?) - Leaked entries (not deallocated after LAST)

**Symptom:** CAM miss errors

**Check:** - Slave returning incorrect IDs - ID corruption in transit - Race condition (deallocation before response complete)

## 2.5.15 Verification Considerations

### Test Scenarios

#### 1. Basic ID Injection/Extraction:

- Single master, single transaction
- Verify BID added correctly
- Verify BID stripped before response delivery
- Check external ID unchanged

#### 2. Multi-Master ID Isolation:

- Multiple masters with same external IDs
- Verify responses route to correct master
- Check BID uniqueness prevents collisions

### 3. CAM Capacity:

- Issue max\_outstanding transactions
- Verify backpressure when CAM full
- Issue responses to free entries
- Verify new transactions accepted

### 4. Out-of-Order Responses:

- Issue transactions: ID=1, ID=2, ID=3
- Return responses: ID=3, ID=1, ID=2
- Verify CAM correctly routes each

### 5. CAM Timeout:

- Issue transaction
- Slave doesn't respond within timeout\_cycles
- Verify timeout error signaled
- Verify CAM entry deallocated (or flagged)

## 2.5.16 Future Enhancements

### Planned Features

- **Dynamic CAM Depth:** Runtime adjustment based on utilization
- **Per-Master CAM Partition:** Guaranteed entries per master
- **ID Compression:** Reduce internal ID width for slaves with limited ID support
- **Transaction Ordering:** CAM tracks issue order for reordering enforcement

### Under Consideration

- **Multi-Level CAM:** Hierarchical for large transaction counts
- **TCAM Support:** Partial ID matching (wildcards)
- **Error Injection:** Debug mode to test error handling
- **CAM Mirrors:** Redundant CAMs for fail-safe operation

---

**Related Sections:** - Section 2.1: Master Adapter (BID injection location) - Section 2.3: Crossbar Core (BID extraction, response routing) - Section 2.8: Response Routing (detailed routing logic) - Section 3.2: Master Port Interface (ID width specifications)

## 2.6 Width Conversion

Width Conversion is the mechanism by which the bridge adapts between masters and slaves with different data bus widths. The bridge uses a 64-bit internal data path, with conversion logic at the master and slave interfaces to support narrower or wider external widths.

### 2.6.1 Purpose and Function

Width conversion performs the following critical functions:

1. **Data Path Adaptation:** Converts between different data widths (8, 16, 32, 64, 128, 256 bits)
2. **Burst Splitting:** Divides wide transactions into multiple narrow transactions
3. **Burst Merging:** Combines multiple narrow beats into fewer wide beats
4. **Strobe Mapping:** Translates write strobes across width boundaries
5. **Address Alignment:** Adjusts addresses for width differences

### 2.6.2 Internal Data Width

#### 64-Bit Standard

The bridge uses **64-bit (8-byte) internal data width** for the crossbar:

Rationale:

- Balance between resource usage and performance
- Common width for modern embedded processors
- Efficient for both 32-bit and 64-bit masters
- Reasonably sized multiplexers in crossbar

#### Width Conversion Locations

Master (32-bit) → [Upsizer] → Crossbar (64-bit) → [Downsizer] → Slave (32-bit)

Master (64-bit) → [No conversion] → Crossbar (64-bit) → [No conversion] → Slave (64-bit)

Master (128-bit) → [Downsizer] → Crossbar (64-bit) → [Upsizer] → Slave (128-bit)

## 2.6.3 Block Diagram



### Width Conversion Architecture

Figure 2.6: Width conversion architecture showing data upsizing and downsizing with beat count adjustment and strobe handling.

## 2.6.4 Upsizing (Narrow to Wide)

### Overview

**Upsizing** converts narrow data to wide data by buffering multiple narrow beats into a single wide beat.

Example: 32-bit master → 64-bit crossbar  
Master beat 0: 32'hAAAAA\_BBBB (bytes 3:0)  
Master beat 1: 32'hCCCC\_DDDD (bytes 7:4)  
→ Crossbar beat: 64'hCCCC\_DDDD\_AAAA\_BBBB

### Write Upsizing

#### Burst of 4 (32-bit) → Burst of 2 (64-bit):

Master AW: AWADDR = 0x1000, AWLEN = 3, AWSIZE = 2 (4 bytes)

Master W beats:

W[0]: WDATA = 0xAAAAA\_BBBB, WSTRB = 4'b1111, WLAST = 0  
W[1]: WDATA = 0xCCCC\_DDDD, WSTRB = 4'b1111, WLAST = 0  
W[2]: WDATA = 0xEEEEE\_FFFF, WSTRB = 4'b1111, WLAST = 0  
W[3]: WDATA = 0x1111\_2222, WSTRB = 4'b1111, WLAST = 1

Crossbar AW: AWADDR = 0x1000, AWLEN = 1, AWSIZE = 3 (8 bytes)

Crossbar W beats:

W[0]: WDATA = 0xCCCC\_DDDD\_AAAA\_BBBB, WSTRB = 8'b1111\_1111, WLAST = 0  
W[1]: WDATA = 0x1111\_2222\_EEEE\_FFFF, WSTRB = 8'b1111\_1111, WLAST = 1

### Read Upsizing

#### Burst of 8 (32-bit) → Burst of 4 (64-bit):

Master AR: ARADDR = 0x2000, ARLEN = 7, ARSIZE = 2 (4 bytes)

Crossbar AR: ARADDR = 0x2000, ARLEN = 3, ARSIZE = 3 (8 bytes)

Crossbar R beats:

R[0]: RDATA = 0x1111\_2222\_3333\_4444, RLAST = 0  
R[1]: RDATA = 0x5555\_6666\_7777\_8888, RLAST = 0  
R[2]: RDATA = 0x9999\_AAAA\_BBBB\_CCCC, RLAST = 0  
R[3]: RDATA = 0xDDDD\_EEEE\_FFFF\_0000, RLAST = 1

Master R beats (split from crossbar):

R[0]: RDATA = 0x3333\_4444, RLAST = 0 (from R[0] low)  
R[1]: RDATA = 0x1111\_2222, RLAST = 0 (from R[0] high)  
R[2]: RDATA = 0x7777\_8888, RLAST = 0 (from R[1] low)  
R[3]: RDATA = 0x5555\_6666, RLAST = 0 (from R[1] high)  
R[4]: RDATA = 0xBBBB\_CCCC, RLAST = 0 (from R[2] low)  
R[5]: RDATA = 0x9999\_AAAA, RLAST = 0 (from R[2] high)

R[6]: RDATA = 0xFFFF\_0000, RLAST = 0 (from R[3] low)  
R[7]: RDATA = 0xDDDD\_EEEE, RLAST = 1 (from R[3] high)

### Strobe Mapping (Upsizing)

32-bit WSTRB → 64-bit WSTRB:

Beat 0 (lower 32 bits):

32-bit WSTRB = 4'b1010 → 64-bit WSTRB = 8'b0000\_1010

Beat 1 (upper 32 bits):

32-bit WSTRB = 4'b1111 → 64-bit WSTRB = 8'b1111\_0000

Combined:

64-bit WSTRB = 8'b1111\_1010

### 2.6.5 Downsizing (Wide to Narrow)

#### Overview

**Downsizing** converts wide data to narrow data by splitting a single wide beat into multiple narrow beats.

Example: 128-bit master → 64-bit crossbar

Master beat: 128'h1111\_2222\_3333\_4444\_5555\_6666\_7777\_8888  
→ Crossbar beat 0: 64'h5555\_6666\_7777\_8888 (lower)  
→ Crossbar beat 1: 64'h1111\_2222\_3333\_4444 (upper)

#### Write Downsizing

##### Burst of 2 (128-bit) → Burst of 4 (64-bit):

Master AW: AWADDR = 0x3000, AWLEN = 1, AWSIZE = 4 (16 bytes)

Master W beats:

W[0]: WDATA = 0xAAAA... (128 bits), WSTRB = 16'hFFFF, WLAST = 0  
W[1]: WDATA = 0xBBB... (128 bits), WSTRB = 16'hFFFF, WLAST = 1

Crossbar AW: AWADDR = 0x3000, AWLEN = 3, AWSIZE = 3 (8 bytes)

Crossbar W beats:

W[0]: WDATA = 0xAAAA\_low(64), WSTRB = 8'hFF, WLAST = 0  
W[1]: WDATA = 0xAAAA\_high(64), WSTRB = 8'hFF, WLAST = 0  
W[2]: WDATA = 0xBBB\_low(64), WSTRB = 8'hFF, WLAST = 0  
W[3]: WDATA = 0xBBB\_high(64), WSTRB = 8'hFF, WLAST = 1

#### Read Downsizing

##### Burst of 1 (128-bit) → Burst of 2 (64-bit):

Master AR: ARADDR = 0x4000, ARLEN = 0, ARSIZE = 4 (16 bytes)

Crossbar AR: ARADDR = 0x4000, ARLEN = 1, ARSIZE = 3 (8 bytes)

Crossbar R beats:

R[0]: RDATA = 0x1111\_2222\_3333\_4444, RLAST = 0  
R[1]: RDATA = 0x5555\_6666\_7777\_8888, RLAST = 1

Master R beat (merged):

R[0]: RDATA = 0x5555\_6666\_7777\_8888\_1111\_2222\_3333\_4444, RLAST = 1

### Strobe Mapping (Downsizing)

128-bit WSTRB → 64-bit WSTRB (split):

128-bit WSTRB = 16'hFF00\_0000

Beat 0 (bytes 7:0):

64-bit WSTRB = 8'b0000\_1111\_1111\_0000 = 8'h0F0 → 8'hF0

Beat 1 (bytes 15:8):

64-bit WSTRB = 8'b1111\_0000\_1111\_0000 = 8'hF0F0 >> 8 → 8'hF0

## 2.6.6 Address Alignment

### Address Adjustment for Width

When changing widths, addresses must align to the new width:

32-bit (4-byte) aligned address: 0x1004

Upsized to 64-bit (8-byte): 0x1000 (round down to 8-byte boundary)

Narrow access at 0x1004 within 64-bit word:

- Byte offset = 0x1004 & 0x7 = 4
- Data at bytes [7:4] of 64-bit word
- Lower bytes [3:0] unused (WSTRB = 8'b1111\_0000)

### Unaligned Access Handling

Master: 32-bit, unaligned access at 0x1002

Problem: Not aligned to 4-byte boundary

Options:

1. Error response (strict mode)
2. Round down address, use WSTRB for actual bytes
3. Split into multiple aligned accesses

Bridge default: Option 2 (use WSTRB)

## 2.6.7 Burst Length Adjustment

### Length Calculation

Formula:

$$\text{New\_Length} = (\text{Old\_Length} + 1) \times (\text{Old\_Width} / \text{New\_Width}) - 1$$

Example: Upsizing 32→64

Old: AWLEN = 7 (8 beats), WIDTH = 32

New: AWLEN =  $(7+1) \times (32/64) - 1 = 8 \times 0.5 - 1 = 3$  (4 beats)

Example: Downsizing 128→64

Old: ARLEN = 3 (4 beats), WIDTH = 128

New: ARLEN =  $(3+1) \times (128/64) - 1 = 4 \times 2 - 1 = 7$  (8 beats)

### Odd Burst Lengths

When burst doesn't divide evenly:

Example: 3 beats of 32-bit → 64-bit

3 beats × 4 bytes = 12 bytes total

12 bytes / 8 bytes per beat = 1.5 beats

Solution:

- Beat 0: Full 64-bit (8 bytes)
- Beat 1: Partial 64-bit (4 bytes, WSTRB = 8'b0000\_1111)
- AWLEN = 1 (2 beats)

## 2.6.8 Resource Utilization

### Per-Converter Resources

#### 32→64 Upsizer:

Logic Elements: ~300 LEs

Registers: ~100 regs (buffering + FSM)

Block RAM: 0

Breakdown:

- Data buffer (32 bits): ~32 regs
- Strobe buffer: ~4 regs
- Beat counter: ~8 regs
- Control FSM: ~50 LEs, ~20 regs
- MUX logic: ~150 LEs

#### 64→32 Downsizer:

Logic Elements: ~250 LEs

Registers: ~120 regs

Block RAM: 0

Similar structure but includes split logic

### 128→64 Downsizer:

Logic Elements: ~400 LEs  
Registers: ~180 regs  
Block RAM: 0

Larger data paths, more complex MUX

### Scaling

Resource usage scales primarily with:  
- **Width ratio:** 2:1 vs. 4:1 vs. 8:1 conversion  
- **Data width:** 128-bit vs. 256-bit buffers  
- **Buffering depth:** Single vs. multi-beat buffering

## 2.6.9 Timing Characteristics

### Latency

**Upsizing** (combining beats):  
- Buffering latency: 1-2 cycles (wait for N narrow beats)  
- First beat output: After receiving required narrow beats  
- Example: 32→64 requires 2 master beats before 1 crossbar beat

**Downsizing** (splitting beats):  
- Minimal latency: 1 cycle (register stage)  
- First beat output: Immediately (split from wide beat)  
- Example: 128→64 outputs first 64-bit beat immediately

### Throughput

#### Upsizing Throughput:

32→64 upsizing:  
Master: 32 bits/cycle = 4 bytes/cycle  
Crossbar: 64 bits per 2 cycles = 4 bytes/cycle (same)  
No throughput loss

#### Downsizing Throughput:

128→64 downsizing:  
Master: 128 bits/cycle = 16 bytes/cycle  
Crossbar: 64 bits/cycle = 8 bytes/cycle  
Throughput halved (crossbar becomes bottleneck)

## 2.6.10 Configuration Parameters

### Width Conversion Configuration (TOML)

#### [bridge]

```
internal_data_width = 64      # Crossbar width
enable_width_conversion = true # Allow width differences
```

#### [[masters]]

```
name = "cpu_32bit"
data_width = 32                # Narrower than crossbar (upsizing)
addr_width = 32
```

#### [[masters]]

```
name = "dma_64bit"
data_width = 64                # Matches crossbar (no conversion)
addr_width = 32
```

#### [[masters]]

```
name = "gpu_128bit"
data_width = 128               # Wider than crossbar (downsizing)
addr_width = 36
```

#### [[slaves]]

```
name = "memory_64bit"
data_width = 64                # Matches crossbar (no conversion)
```

#### [[slaves]]

```
name = "periph_32bit"
data_width = 32                # Narrower than crossbar (downsizing)
```

## 2.6.11 Debug and Observability

### Recommended Debug Signals

#### Upsizer:

- Beat accumulator (partial wide beat being assembled)
- Beat counter (position in burst)
- Buffer valid (accumulated beats ready)
- Strobe accumulation

#### Downsizer:

- Beat splitter state (which sub-beat currently outputting)
- Remaining beats to output
- Source data register

## Common Issues and Debug

**Symptom:** Data corruption after width conversion

**Check:** - Byte ordering (endianness) - Strobe mapping (correct bytes enabled) - Address alignment - Burst length calculation

**Symptom:** Stalls after width conversion

**Check:** - Buffer depths (upsizer needs buffering) - Backpressure propagation - LAST signaling

**Symptom:** Incorrect burst lengths

**Check:** - Length calculation formula - Odd burst handling - SIZE field matching width

## 2.6.12 Verification Considerations

### Test Scenarios

#### 1. Power-of-2 Width Ratios:

- 32→64 (2:1)
- 64→128 (1:2)
- 32→128 (4:1)

#### 2. Various Burst Lengths:

- Single beat (ALEN=0)
- Even bursts (ALEN=3, 7, 15)
- Odd bursts (ALEN=1, 5, 9)

#### 3. Sub-Word Accesses:

- Byte writes (WSTRB with individual bytes)
- Half-word, word accesses
- Unaligned accesses

#### 4. Mixed Widths:

- 32-bit master → 64-bit crossbar → 32-bit slave
- 128-bit master → 64-bit crossbar → 32-bit slave
- All width combinations

## 2.6.13 Performance Considerations

### Conversion Overhead

**Upsizing Overhead:** - Buffering latency: +1-2 cycles - Throughput maintained - Best for burst-oriented masters

**Downsizing Overhead:** - Split latency: +1 cycle - Throughput reduced by width ratio - Can bottleneck wide masters

## Optimization Strategies

1. **Match Common Widths:** Design masters/slaves to match internal width (64-bit)
2. **Burst Sizing:** Use appropriate burst lengths for width ratios
3. **Parallel Paths:** Multiple 32-bit masters can aggregate to 64-bit crossbar bandwidth
4. **Selective Conversion:** Only convert where necessary

## 2.6.14 Future Enhancements

### Planned Features

- **Configurable Internal Width:** Support 32, 64, 128, 256-bit crossbar
- **Multi-Beat Buffering:** Deeper upsizing buffers for smoother flow
- **Byte Rotation:** Handle unaligned access more efficiently
- **Pipelined Conversion:** Multi-stage for high frequency

### Under Consideration

- **Asymmetric Widths:** Different widths for read vs. write paths
- **Sparse Data Optimization:** Skip unused bytes in conversions
- **Tagged Data:** Metadata preservation through width conversion
- **Zero-Copy Bypass:** Direct routing for matching widths

---

**Related Sections:** - Section 2.1: Master Adapter (upsizing location) - Section 2.3: Crossbar Core (internal data width) - Section 3.2: Master Port Interface (width specifications) - Section 3.3: Slave Port Interface (width specifications)

## 2.7 Protocol Conversion

Protocol Conversion enables the bridge to interface with slaves using different bus protocols. While the bridge internally uses AXI4, it can convert to simpler protocols like APB (Advanced Peripheral Bus) for low-bandwidth peripheral access.

### 2.7.1 Purpose and Function

Protocol conversion performs the following critical functions:

1. **Protocol Translation:** Converts AXI4 transactions to target protocol (e.g., APB)

2. **Handshake Mapping:** Translates ready/valid to protocol-specific handshakes
3. **Burst Decomposition:** Breaks AXI bursts into single-beat target transactions
4. **Response Mapping:** Converts protocol-specific responses back to AXI responses
5. **Timing Adaptation:** Handles different timing requirements between protocols

## 2.7.2 Supported Protocols

### Current Support (Phase 2)

**AXI4 (Native):** - Full AXI4 protocol - No conversion required - Maximum performance

**AXI4-Lite (Master-Side):** - Simplified AXI4 subset - Single-beat transactions only (ARLEN=0, AWLEN=0) - Converts to full AXI4 for crossbar - Common for control/status registers

**APB (Slave-Side):** - APB3 and APB4 support - For low-bandwidth peripherals - Simplified handshaking

### Future Support (Phase 2+)

- **AHB:** Advanced High-performance Bus
- **Wishbone:** Open-source bus standard
- **Custom:** User-defined protocols

## 2.7.3 Conversion Architecture Overview





## 2.7.4 Block Diagram



### Protocol Conversion Architecture

Figure 2.7: Protocol conversion showing AXI4 to APB conversion with state machine, burst decomposition, and response mapping.

## 2.7.5 AXI4-Lite to AXI4 Conversion (Master-Side)

### AXI4-Lite Protocol Overview

AXI4-Lite is a simplified subset of AXI4 designed for simple control/status register access:

#### Key Differences from Full AXI4:

- **FIXED burst length:** ARLEN/AWLEN always = 0 (single beat)
- **FIXED burst size:** No SIZE field, always full data width
- **FIXED burst type:** No BURST field, always INCR
- **NO exclusive access:** No LOCK support
- **NO unaligned transfers:** Address must be aligned
- **Simpler ID:** Typically 1-4 bits (fewer outstanding transactions)

#### Similarities to AXI4:

- Same 5 channels: AR, R, AW, W, B
- Same valid/ready handshaking
- Same response codes: OKAY, SLVERR, DECERR, EXOKAY
- Same data widths: 32 or 64 bits typically

## Conversion Requirements

To adapt AXI4-Lite masters to the full AXI4 crossbar, the adapter must:

1. **Add Missing Signals:** Provide default values for burst-related signals
2. **Validate Constraints:** Ensure single-beat assumption holds
3. **Pass-Through Simplicity:** Most signals connect directly

## Signal Mapping

// AXI4-Lite to AXI4 Signal Mapping

```
// AR Channel (Read Address)
// AXI4-Lite Input          AXI4 Crossbar Output
axi4lite_arvalid      →  axi4_arvalid
axi4lite_arready       ←  axi4_arready
axi4lite_araddr        →  axi4_araddr
axi4lite_arprot        →  axi4_arprot
axi4lite_arid (opt)   →  axi4_arid

// Added by adapter (constants):
beat
                    axi4_arlen    = 8'h00      // Always 1
buf
                    axi4_arsize   = log2(DW/8) // Full width
                    axi4_arburst  = 2'b01      // INCR
                    axi4_arlock   = 1'b0       // No lock
                    axi4_arcache  = 4'b0000    // Device non-
                    axi4_arqos    = 4'h0       // No QoS
                    axi4_arregion = 4'h0       // Region 0

// R Channel (Read Data)
// AXI4 Crossbar Input      AXI4-Lite Output
axi4_rvalid           →  axi4lite_rvalid
axi4_rready           ←  axi4lite_rready
axi4_rdata            →  axi4lite_rdata
axi4_rresp             →  axi4lite_rresp
axi4_rid (opt)        →  axi4lite_rid (opt)

// Discarded by adapter:
axi4_rlast             // Always 1 for single beat

// AW Channel (Write Address)
axi4lite_awvalid      →  axi4_awvalid
axi4lite_awready       ←  axi4_awready
axi4lite_awaddr        →  axi4_awaddr
axi4lite_awprot        →  axi4_awprot
axi4lite_awid (opt)   →  axi4_awid
```

```

// Added by adapter:
    axi4_awlen      = 8'h00
    axi4_awsize     = log2(DW/8)
    axi4_awburst    = 2'b01
    axi4_awlock     = 1'b0
    axi4_awcache    = 4'b0000
    axi4_awqos      = 4'h0
    axi4_awregion   = 4'h0

// W Channel (Write Data)
axi4lite_wvalid      -> axi4_wvalid
axi4lite_wready      <- axi4_wready
axi4lite_wdata       -> axi4_wdata
axi4lite_wstrb       -> axi4_wstrb

// Added by adapter:
                        axi4_wlast      = 1'b1      // Always last

// B Channel (Write Response)
axi4_bvalid          -> axi4lite_bvalid
axi4_bready          <- axi4lite_bready
axi4_bresp           -> axi4lite_bresp
axi4_bid (opt)       -> axi4lite_bid (opt)

```

## Implementation

The AXI4-Lite adapter is extremely simple, primarily providing constant values:

```

// AXI4-Lite to AXI4 Adapter (simplified)
module axi4lite_to_axi4_adapter #(
    parameter ADDR_WIDTH = 32,
    parameter DATA_WIDTH = 64,
    parameter ID_WIDTH = 4           // Optional, often 0 for AXI4-Lite
) (
    input logic clk,
    input logic rst_n,

    // AXI4-Lite Master Interface
    input logic                      lite_arvalid,
    output logic                     lite_arready,
    input logic [ADDR_WIDTH-1:0]      lite_araddr,
    input logic [2:0]                 lite_arprot,
    input logic [ID_WIDTH-1:0]        lite_arid,    // Optional

    output logic                     lite_rvalid,
    input logic                      lite_rready,
    output logic [DATA_WIDTH-1:0]    lite_rdata,
    output logic [1:0]               lite_rresp,

```

```

output logic [ID_WIDTH-1:0]      lite_rid,      // Optional
input logic                      lite_awvalid,
output logic                   lite_awready,
input logic [ADDR_WIDTH-1:0]    lite_awaddr,
input logic [2:0]                 lite_awprot,
input logic [ID_WIDTH-1:0]      lite_awid,     // Optional

input logic                      lite_wvalid,
output logic                   lite_wready,
input logic [DATA_WIDTH-1:0]   lite_wdata,
input logic [DATA_WIDTH/8-1:0]  lite_wstrb,

output logic                   lite_bvalid,
input logic                    lite_bready,
output logic [1:0]              lite_bresp,
output logic [ID_WIDTH-1:0]    lite_bid,      // Optional

// Full AXI4 Crossbar Interface
output logic                  axi4_arvalid,
input logic                   axi4_arready,
output logic [ADDR_WIDTH-1:0]  axi4_araddr,
output logic [7:0]              axi4_arlen,
output logic [2:0]              axi4_arsize,
output logic [1:0]              axi4_arburst,
output logic                  axi4_arlock,
output logic [3:0]              axi4_arcache,
output logic [2:0]              axi4_arprot,
output logic [3:0]              axi4_arqos,
output logic [3:0]              axi4_arregion,
output logic [ID_WIDTH-1:0]    axi4_arid,

input logic                   axi4_rvalid,
output logic                  axi4_rready,
input logic [DATA_WIDTH-1:0]  axi4_rdata,
input logic [1:0]              axi4_rresp,
input logic                  axi4_rlast,
input logic [ID_WIDTH-1:0]    axi4_rid,

// ... (AW, W, B channels similar)
);

// AR Channel: Pass-through with constants
assign axi4_arvalid = lite_arvalid;
assign lite_arready = axi4_arready;
assign axi4_araddr = lite_araddr;
assign axi4_arprot = lite_arprot;
assign axi4_arid = lite_arid;

// Constants for single-beat burst

```

```

assign axi4_arlen    = 8'h00;           // 1 beat
assign axi4_arsize   = $clog2(DATA_WIDTH/8); // Full width
assign axi4_arburst  = 2'b01;           // INCR
assign axi4_arlock   = 1'b0;            // No lock
assign axi4_arcache  = 4'b0000;          // Device non-buf
assign axi4_arqos    = 4'h0;             // No QoS
assign axi4_arregion = 4'h0;             // Region 0

// R Channel: Pass-through, ignore rlast
assign lite_rvalid = axi4_rvalid;
assign axi4_rready = lite_rready;
assign lite_rdata  = axi4_rdata;
assign lite_rresp   = axi4_rresp;
assign lite_rid    = axi4_rid;
// axi4_rlast ignored (always 1 for single beat)

// AW Channel: Similar to AR
assign axi4_awvalid  = lite_awvalid;
assign lite_awready = axi4_awready;
assign axi4_awaddr   = lite_awaddr;
assign axi4_awprot   = lite_awprot;
assign axi4_awid    = lite_awid;
assign axi4_awlen    = 8'h00;
assign axi4_awsize   = $clog2(DATA_WIDTH/8);
assign axi4_awburst  = 2'b01;
assign axi4_awlock   = 1'b0;
assign axi4_awcache  = 4'b0000;
assign axi4_awqos    = 4'h0;
assign axi4_awregion = 4'h0;

// W Channel: Pass-through, add wlast
assign axi4_wvalid = lite_wvalid;
assign lite_wready = axi4_wready;
assign axi4_wdata   = lite_wdata;
assign axi4_wstrb   = lite_wstrb;
assign axi4_wlast   = 1'b1;           // Always last

// B Channel: Pass-through
assign lite_bvalid = axi4_bvalid;
assign axi4_bready = lite_bready;
assign lite_bresp   = axi4_bresp;
assign lite_bid     = axi4_bid;

endmodule

```

## Resource Utilization

### AXI4-Lite Adapter Resources:

Logic Elements: ~50-100 LEs (minimal, mostly wiring)  
Registers: ~50 regs (if skid buffers added)  
Block RAM: 0

Breakdown:

- Signal pass-through: ~20 LEs (buffering)
- Constant generation: ~10 LEs
- Optional skid buffers: ~50 regs (for timing)

Note: Most implementations are purely combinatorial wire assignments with optional pipeline registers.

## Performance Impact

**Latency:** - **Zero-latency** (combinatorial) if no pipeline stages - **1-2 cycles** if skid buffers added for timing - No protocol conversion overhead

**Throughput:** - **1 transaction per cycle** (same as native AXI4) - No degradation for single-beat transactions - Limited by AXI4-Lite's single-beat constraint

## Configuration

```
[[masters]]  
name = "control_processor"  
protocol = "axi4lite"          # Specify simplified protocol  
channels = "rw"                # Full read-write  
arid_width = 0                 # Often no ID in AXI4-Lite  
awid_width = 0  
addr_width = 32  
data_width = 32                 # Typically 32 or 64 bits
```

## Common Issues and Debug

### Issue 1: Burst Detected on AXI4-Lite

Symptom: ARLEN/AWLEN != 0 on AXI4-Lite interface  
Cause: Master not properly configured as AXI4-Lite  
Check: Verify master only issues single-beat transactions

### Issue 2: Unaligned Addresses

Symptom: ARADDR/AWADDR not aligned to data width  
Cause: AXI4-Lite requires full-width aligned access  
Check: Address[log2(DW/8)-1:0] should be zero

### Issue 3: rlast/wlast Handling

Symptom: Master expects rlast/wlast but doesn't have them  
Cause: True AXI4-Lite interface omits these signals  
Solution: Adapter provides wlast=1 to crossbar, strips rlast

## 2.7.6 AXI4 to APB Conversion (Slave-Side)

### APB Protocol Overview

APB is a simple, low-power bus protocol:

Characteristics:

- Single address phase
- Single data phase
- No burst support (one transfer per operation)
- Minimal logic
- Low power consumption
- Suitable for peripherals: UARTs, timers, GPIOs

### APB Signals

Address Phase:

- |              |                                     |
|--------------|-------------------------------------|
| PADDR[N-1:0] | - Address bus                       |
| PSEL         | - Slave select                      |
| PENABLE      | - Enable (2nd cycle of transfer)    |
| PWRITE       | - Write direction (1=write, 0=read) |

Data Phase (Write):

- |                |                             |
|----------------|-----------------------------|
| PWDATA[N-1:0]  | - Write data                |
| PSTRB[N/8-1:0] | - Write strobes (APB4 only) |

Data Phase (Read):

- |               |             |
|---------------|-------------|
| PRDATA[N-1:0] | - Read data |
|---------------|-------------|

Response:

- |         |                                     |
|---------|-------------------------------------|
| PREADY  | - Slave ready (can extend transfer) |
| PSLVERR | - Slave error (APB3+)               |

### APB State Machine

APB requires a 2-phase handshake:

IDLE:

- Wait for AXI request (ARVALID or AWVALID)
- PSEL = 0, PENABLE = 0

SETUP:

- Assert PSEL = 1
- Drive PADDR, PWRITE, PWDATA (if write)
- PENABLE = 0
- Duration: 1 cycle

ACCESS:

- Assert PENABLE = 1
- Wait for PREADY = 1
- Capture PRDATA (if read) or PSLVERR

- Can extend multiple cycles if PREADY = 0

Complete:

- De-assert PSEL, PENABLE
- Return to IDLE or SETUP (if more beats)

## Read Transaction Conversion

### AXI4 Read:

Cycle 0: ARVALID=1, ARADDR=0x100, ARLEN=3 (4 beats)

Cycle 1: ARREADY=1

Cycles 2-5: R beats returning

**Converted to APB (4 separate APB reads):**

Beat 0:

Cycle 0: PSEL=1, PENABLE=0, PADDR=0x100, PWRITE=0 (SETUP)

Cycle 1: PSEL=1, PENABLE=1, wait PREADY (ACCESS)

Cycle 2: PREADY=1, capture PRDATA → First R beat

Beat 1:

Cycle 3: PSEL=1, PENABLE=0, PADDR=0x104 (SETUP)

Cycle 4: PSEL=1, PENABLE=1, wait PREADY (ACCESS)

Cycle 5: PREADY=1, capture PRDATA → Second R beat

... (beats 2 and 3 similar)

**Latency:** 2-3 cycles per beat (SETUP + ACCESS + ready)

## Write Transaction Conversion

### AXI4 Write:

Cycle 0: AWVALID=1, AWADDR=0x200, AWLEN=1 (2 beats)

Cycle 1: AWREADY=1

Cycle 1: WVALID=1, WDATA=0xAAAA\_BBBB, WLAST=0

Cycle 2: WREADY=1

Cycle 2: WVALID=1, WDATA=0xCCCC\_DDDD, WLAST=1

Cycle 3: WREADY=1

Cycle 4: BVALID=1, BRESP=OKAY

**Converted to APB (2 separate APB writes):**

Beat 0:

Cycle 0: PSEL=1, PENABLE=0, PADDR=0x200, PWRITE=1,  
PADATA=0xAAAA\_BBBB

Cycle 1: PSEL=1, PENABLE=1, wait PREADY

Cycle 2: PREADY=1 → First write complete

Beat 1:

```

Cycle 3: PSEL=1, PENABLE=0, PADDR=0x204, PWRITE=1,
PWDATA=0xCCCC_DDDD
Cycle 4: PSEL=1, PENABLE=1, wait PREADY
Cycle 5: PREADY=1 → Second write complete, return B

```

## Burst Handling

APB does not support bursts, so:

- AXI Burst: AWLEN = 15 (16 beats)
- 16 separate APB transfers
- Address increments per AWBURST type:
  - INCR: Addr += SIZE each beat
  - WRAP: Wrapping within boundary
  - FIXED: Same address each beat

## Response Mapping

APB → AXI Response Translation:

```

PREADY=1, PSLVERR=0 → RRESP/BRESP = 2'b00 (OKAY)
PREADY=1, PSLVERR=1 → RRESP/BRESP = 2'b10 (SLVERR)

```

Timeout (PREADY stuck at 0):

After N cycles → RRESP/BRESP = 2'b11 (DECERR)

## 2.7.5 Implementation

### AXI4-to-APB Converter FSM

```

// Simplified AXI4-to-APB converter
typedef enum logic [2:0] {
    IDLE,
    AR_SETUP,
    AR_ACCESS,
    AW_SETUP,
    W_ACCESS,
    B_RESPONSE
} state_t;

state_t state, next_state;

always_ff @(posedge clk) begin
    if (!rst_n) state <= IDLE;
    else state <= next_state;
end

always_comb begin
    next_state = state;

    case (state)

```

```

IDLE: begin
    if (arvalid) next_state = AR_SETUP;
    else if (awvalid) next_state = AW_SETUP;
end

AR_SETUP: begin
    next_state = AR_ACCESS; // 1 cycle SETUP
end

AR_ACCESS: begin
    if (pready) begin
        if (more_beats) next_state = AR_SETUP; // Next beat
        else next_state = IDLE;
    end
end

AW_SETUP: begin
    if (wvalid) next_state = W_ACCESS;
end

W_ACCESS: begin
    if (pready) begin
        if (!wlast) next_state = AW_SETUP; // Next beat
        else next_state = B_RESPONSE;
    end
end

B_RESPONSE: begin
    if (bready) next_state = IDLE;
end
endcase
end

// APB signal generation
assign psel = (state != IDLE);
assign penable = (state == AR_ACCESS || state == W_ACCESS);
assign pwrite = (state == AW_SETUP || state == W_ACCESS);



### Address Generation


// Address increment for burst
logic [ADDR_WIDTH-1:0] current_addr;
logic [7:0] beat_count;

always_ff @(posedge clk) begin
    if (state == IDLE) begin
        current_addr <= arvalid ? araddr : awaddr;
        beat_count <= 0;
    end else if ((state == AR_ACCESS || state == W_ACCESS) && pready)
begin
        beat_count <= beat_count + 1;
    end
end

```

```

        case (burst_type)
            2'b01: current_addr <= current_addr + (1 << size); // 
INCR
            2'b10: current_addr <= wrap_address(current_addr); // 
WRAP
            2'b00: current_addr <= current_addr; // 
FIXED
        endcase
    end
end

assign paddr = current_addr;

```

## 2.7.6 Resource Utilization

### APB Converter Resources

#### Per APB Slave Interface:

Logic Elements: ~400 LEs  
Registers: ~150 regs  
Block RAM: 0

#### Breakdown:

- FSM control: ~100 LEs, ~20 regs
- Address generation: ~80 LEs, ~40 regs
- Burst counter: ~50 LEs, ~20 regs
- Response accumulation: ~80 LEs, ~30 regs
- Data path MUX: ~90 LEs, ~40 regs

### Scaling

Adding APB slaves: - Linear scaling: +~400 LEs per APB slave - Shared address decoder logic - Independent per-slave FSMs

## 2.7.7 Timing Characteristics

### Latency

#### APB Read Latency (per beat):

Best case: 2 cycles (SETUP + ACCESS with PREADY=1)  
Typical: 3-5 cycles (if slave extends with PREADY=0)  
Worst case: Configurable timeout (e.g., 1000 cycles)

#### For 8-beat AXI burst:

Total:  $8 \times 3 = 24$  cycles typical

## APB Write Latency (per beat):

Similar to read: 2-5 cycles per beat

## Throughput

### Severely Limited:

APB: ~0.3-0.5 transactions/cycle (due to 2-3 cycle protocol)  
AXI4: 1 transaction/cycle (burst mode)

APB suitable only for low-bandwidth peripherals

## 2.7.8 Configuration Parameters

### Protocol Conversion Configuration (TOML)

```
[[slaves]]
name = "uart_peripheral"
protocol = "apb"          # "axi4", "apb", "ahb" (future)
base_address = 0xF000_0000
size = 0x1000
data_width = 32
apb_timeout = 1000        # Cycles before timeout error

[[slaves]]
name = "ddr_memory"
protocol = "axi4"          # Native, no conversion
base_address = 0x8000_0000
size = 0x4000_0000
data_width = 64
```

## 2.7.9 Debug and Observability

### Recommended Debug Signals

APB Converter:

- FSM state
- APB phase (SETUP, ACCESS)
- Beat counter (progress through burst)
- PREADY timeout counter
- Response accumulation (for burst)

APB Bus:

- PSEL, PENABLE, PWRITE
- PADDR, PWDATA, PRDATA
- PREADY, PSLVERR

## Common Issues and Debug

**Symptom:** APB slave not responding (timeout)

**Check:** - PREADY signal (stuck at 0?) - APB slave clock/reset - PSEL assertion - Timeout threshold

**Symptom:** Data corruption on APB

**Check:** - SETUP phase duration (should be 1 cycle) - PENABLE assertion timing - Data sampling on correct cycle

**Symptom:** Burst to APB takes too long

**Check:** - Burst length (consider limiting ARLEN/AWLEN) - APB slave response time (PREADY) - Alternative: Use AXI4 slave instead

## 2.7.10 Verification Considerations

### Test Scenarios

#### 1. Single APB Transfer:

- AXI ARLEN=0 (1 beat) → 1 APB read
- Verify SETUP → ACCESS sequence
- Check PREADY handling

#### 2. APB Burst Decomposition:

- AXI AWLEN=7 (8 beats) → 8 APB writes
- Verify address increment
- Check each beat completes before next

#### 3. APB PREADY Extension:

- Slave holds PREADY=0 for N cycles
- Verify converter waits
- Check no data corruption

#### 4. APB Error Response:

- Slave asserts PSLVERR
- Verify mapped to AXI SLVERR
- Check error propagated to master

#### 5. APB Timeout:

- Slave never asserts PREADY
- Verify timeout after N cycles
- Check DECERR response

## 2.7.11 Performance Considerations

### When to Use APB

**Good Use Cases:** - Low-speed peripherals (UART, GPIO, timers) - Infrequent accesses - Simple register interfaces - Power-sensitive designs

**Poor Use Cases:** - High-bandwidth devices - Burst-intensive masters - Performance-critical paths - Memory interfaces

### APB vs. AXI4 Comparison

| Feature       | APB            | AXI4                  |
|---------------|----------------|-----------------------|
| Complexity    | Simple         | Complex               |
| Throughput    | Low (~0.3/cyc) | High (1/cyc burst)    |
| Latency/beat  | 2-5 cycles     | 1 cycle               |
| Burst Support | No             | Yes (up to 256)       |
| Resources     | ~400 LEs       | Native (no converter) |
| Power         | Very low       | Moderate              |
| Use Case      | Peripherals    | Memory, DMA           |

## 2.7.12 Mixed Protocol Bridges

### Example Configuration

```
# Bridge with mixed protocols
[bridge]
num_masters = 2
num_slaves = 3

[[masters]]
name = "cpu"
protocol = "axi4"

[[masters]]
name = "dma"
protocol = "axi4"

[[slaves]]
name = "ddr_memory"
protocol = "axi4"          # High bandwidth

[[slaves]]
name = "sram"
protocol = "axi4"          # Medium bandwidth

[[slaves]]
name = "peripherals"
protocol = "apb"            # Low bandwidth, simple
```

## Routing Optimization

CPU → DDR Memory: AXI4-to-AXI4 (native, fast)  
CPU → Peripherals: AXI4-to-APB (converted, slower)  
DMA → SRAM: AXI4-to-AXI4 (native, fast)  
DMA → Peripherals: AXI4-to-APB (rare, acceptable slowdown)

## 2.7.13 Master-Side vs Slave-Side Conversion

### Comparison

| Feature            | Master-Side (AXI4-Lite) | Slave-Side (APB)     |
|--------------------|-------------------------|----------------------|
| Complexity         | Very Simple             | Complex              |
| Resource Usage     | ~50-100 LEs             | ~400 LEs             |
| Latency Added      | 0-1 cycles              | 2-5 cycles/beat      |
| Throughput Impact  | None                    | Severe (3x slower)   |
| Burst Handling     | Single beat only        | Decompose to singles |
| State Machine      | None (combinatorial)    | Multi-state FSM      |
| Buffering Required | Optional (timing)       | Essential            |
| Use Case           | Control registers       | Peripherals          |

### When to Use Each

**AXI4-Lite Master Adapter:** - Simple control/status register interfaces - Low-complexity masters (MCUs, simple CPUs) - Minimal resource overhead acceptable - No burst performance needed

**APB Slave Converter:** - Legacy peripheral integration - Very simple slave devices (GPIO, timers) - Low-bandwidth acceptable - Power optimization critical

## 2.7.14 Future Protocol Support

### Planned Features

**AXI4-Lite:** - Subset of AXI4 - No burst support (ALEN=0 always) - Simpler than full AXI4 - Common for control registers

**AHB (AMBA High-performance Bus):** - More capable than APB - Pipeline support - Burst support - Suitable for moderate-bandwidth peripherals

**Wishbone:** - Open-source bus standard - Common in FPGA designs - Multiple addressing modes - Configurable data widths

### Under Consideration

- **Custom Protocol:** User-defined through configuration
- **Stream Interface:** AXI4-Stream for data streaming
- **PCIe TLP:** For PCIe endpoint integration

- **CHI:** ARM's Coherent Hub Interface

## 2.7.14 Best Practices

### Design Recommendations

1. **Limit APB Burst Lengths:** Configure masters to use short bursts to APB slaves
2. **Proper Timeouts:** Set realistic timeout values for APB slaves
3. **Protocol Matching:** Use native AXI4 where possible, APB only when necessary
4. **Address Map Planning:** Group APB peripherals together for efficient decoding
5. **Width Matching:** Match APB data width to peripheral requirements

### Performance Tips

Inefficient: 256-beat AXI burst → APB  
 $256 \text{ beats} \times 3 \text{ cyc/beat} = 768 \text{ cycles}$

Better: Limit to 4-beat bursts → APB  
 64 bursts of 4 beats each  
 Still long, but more manageable

Best: Use AXI4-Lite slave for registers  
 No burst, but native protocol

---

**Related Sections:** - Section 2.3: Crossbar Core (protocol integration point) -  
 Section 3.3: Slave Port Interface (APB signal specifications) - Chapter 4:  
 Programming (configuring protocol conversion) - Appendix A: Generator Deep  
 Dive (protocol converter generation)

## 2.8 Response Routing

Response Routing is the mechanism by which slave responses (read data and write acknowledgments) are directed back to the originating master. This system uses Bridge IDs, demultiplexing logic, and optional CAM structures to ensure responses reach the correct destination.

### 2.8.1 Purpose and Function

Response routing performs the following critical functions:

1. **Response Direction:** Routes R and B channel responses to correct master
2. **ID-Based Routing:** Uses extracted Bridge IDs to determine destination
3. **Multi-Slave Merging:** Combines responses from multiple slaves to single master
4. **Flow Control:** Manages backpressure from master on response channels
5. **Error Propagation:** Ensures error responses reach originating master

## 2.8.2 Block Diagram



### Response Routing Architecture

Figure 2.8: Response routing architecture showing B and R channel merging from slaves and ID-based demultiplexing to masters.

## 2.8.3 BID Extraction

### Simple Extraction (No CAM)

For configurations where direct BID mapping suffices:

```
// Extract Bridge ID from response ID
logic [TOTAL_RID_WIDTH-1:0] slave_rid;           // From slave
logic [BID_WIDTH-1:0] extracted_bid;             // Master index
logic [RID_WIDTH-1:0] external_rid;              // To master

// Upper bits = Bridge ID, lower bits = original ID
assign extracted_bid = slave_rid[TOTAL_RID_WIDTH-1:RID_WIDTH];
assign external_rid = slave_rid[RID_WIDTH-1:0];

// Route response based on BID
logic [NUM_MASTERS-1:0] master_rvalid;
always_comb begin
    master_rvalid = '0; // Default: no master selected
    master_rvalid[extracted_bid] = slave_rvalid;
end
```

### CAM-Based Extraction

For complex configurations with OOO or ID reordering:

```
// CAM lookup for response routing
logic [TOTAL_RID_WIDTH-1:0] response_id;
logic [BID_WIDTH-1:0] master_index;
logic cam_hit;

// Search CAM for matching transaction
assign {cam_hit, master_index} = cam_lookup(response_id);

// Route response to found master
if (cam_hit) begin
    master_rvalid[master_index] = slave_rvalid;
end else begin
    // Error: No matching transaction (protocol violation)
    error_response();
end
```

## 2.8.4 Response Demultiplexing

### Per-Master Response Paths

Each master has dedicated response channels from the demultiplexer:

Master 0 Response:

- M0\_RVALID, M0\_RREADY, M0\_RDATA, M0\_RID, M0\_RRESP, M0\_RLAST
- M0\_BVALID, M0\_BREADY, M0\_BID, M0\_BRESP

Source: Responses from any slave with BID=0

### Demux Implementation

```
// Response demultiplexer (4 masters, 3 slaves)
// Combines responses from all slaves, routes to correct master

// Slave responses (multiple sources)
logic s0_rvalid, s1_rvalid, s2_rvalid;
logic [63:0] s0_rdata, s1_rdata, s2_rdata;
logic [5:0] s0_rid, s1_rid, s2_rid; // Includes BID

// Master responses (multiple destinations)
logic m0_rvalid, m1_rvalid, m2_rvalid, m3_rvalid;
logic [63:0] m0_rdata, m1_rdata, m2_rdata, m3_rdata;
logic [3:0] m0_rid, m1_rid, m2_rid, m3_rid; // BID stripped

// Demux logic per slave
for (genvar s = 0; s < 3; s++) begin
    logic [1:0] bid;
    assign bid = slave_rid[s][5:4]; // Extract BID

    always_comb begin
        case (bid)
            2'b00: begin
                if (slave_rvalid[s] && !m0_busy) begin
                    m0_rvalid = 1'b1;
                    m0_rdata = slave_rdata[s];
                    m0_rid = slave_rid[s][3:0]; // Strip BID
                end
            end
            2'b01: /* Route to M1 */
            2'b10: /* Route to M2 */
            2'b11: /* Route to M3 */
        endcase
    end
end
```

## 2.8.5 Multi-Slave Response Merging

### Problem Statement

When multiple slaves can respond to same master simultaneously:

Scenario:

Master 0 has outstanding transactions to Slave 0 and Slave 1

Both slaves return R responses in same cycle

Problem: Master 0 can only accept one R response per cycle  
Solution: Arbitrate between slave responses

### Response Arbitration

```
// Arbitrate between multiple slave responses for same master
logic [2:0] slave_has_response; // Which slaves have responses for M0
logic [1:0] selected_slave; // Which slave wins arbitration

// Detect responses destined for M0
for (genvar s = 0; s < 3; s++) begin
    assign slave_has_response[s] = slave_rvalid[s] &&
        (extract_bid(slave_rid[s]) ==
2'b00);
end

// Round-robin arbiter for fairness
always_ff @(posedge clk) begin
    if (!rst_n) begin
        selected_slave <= 0;
    end else if (~m0_rready && (|slave_has_response)) begin
        // Rotate selection for fairness
        if (slave_has_response[selected_slave])
            ; // Keep current
        else
            selected_slave <= find_next_requesting_slave();
    end
end

// MUX selected slave's response to master
assign m0_rvalid = slave_has_response[selected_slave];
assign m0_rdata = slave_rdata[selected_slave];
assign m0_rid = strip_bid(slave_rid[selected_slave]);
```

### Response Buffering

Optional: Add FIFOs to buffer pending responses:

Purpose:

- Prevent response loss during arbitration conflicts
- Allow slaves to return responses even if master busy
- Smooth out bursty response patterns

Configuration:

per\_master\_response\_fifo\_depth = 4-16 entries

Trade-off:

- + No response loss

- + Better throughput
- Additional resources (~200 LEs per FIFO)
- +1-2 cycle latency

## 2.8.6 Backpressure Management

### Ready Signal Routing

Master RREADY/BREADY must reach correct slave:

```
// Route master ready back to slaves
logic m0_rready; // From master
logic [2:0] slave_rready; // To slaves

// Only selected slave sees master's ready
for (genvar s = 0; s < 3; s++) begin
    assign slave_rready[s] = (selected_slave == s) ? m0_rready : 1'b0;
end

// Non-selected slaves see ready = 0 (backpressure)
```

### Stall Conditions

Response path can stall when:

1. Master not ready ( $M\_RREADY = 0$ )
  - Selected slave stalls ( $S\_RREADY = 0$ )
  - Other slaves buffer or stall
2. Arbitration conflict (multiple slaves ready)
  - Non-selected slaves stalled
  - Selected slave proceeds
3. Response FIFO full
  - Slave stalled until space available

## 2.8.7 Response Path Latency

### End-to-End R Channel

Slave R response → Bridge → Master R channel

Components:

1. Slave response valid
2. BID extraction (0 cycles, combinatorial)
3. Demux routing (0-1 cycles)
4. Response arbitration (1 cycle if conflict)
5. Optional FIFO (0-1 cycles)
6. Master adapter (1 cycle, skid buffer)

Total: 2-4 cycles typical

### End-to-End B Channel

Similar to R channel:

Slave B → Extraction → Demux → Arbitration → Master B

Total: 2-4 cycles

## 2.8.8 Error Response Routing

### Slave Error Responses

When slave returns error:

RRESP = 2'b10 (SLVERR) or 2'b11 (DECERR)

BRESP = 2'b10 (SLVERR) or 2'b11 (DECERR)

Routing:

- Extract BID normally
- Route to originating master
- Preserve error code
- Master sees error response

### Out-of-Range Error Responses

When router generates error for OOR address:

Process:

1. Router captures OOR request
2. Generates internal error response
  - Uses cached request ID (with BID)
  - Sets RRESP/BRESP = DECERR
3. Injects into response path
4. Routes to master using BID

## 2.8.9 Resource Utilization

### Response Router Resources

#### 4 masters, 3 slaves (64-bit data):

Logic Elements: ~1500-2000 LEs

Registers: ~400-600 regs

Block RAM: 0 (unless FIFOs enabled)

Breakdown:

- BID extraction (3 slaves): ~150 LEs
- Demux logic (4 masters): ~600 LEs, ~150 regs
- Response arbitration: ~300 LEs, ~100 regs

- Backpressure routing: ~200 LEs, ~50 regs
- Control FSMs: ~250 LEs, ~100 regs

Optional FIFOs (4 × 8 entries): +800 LEs, +2KB BRAM

## Scaling

Resource usage scales with:

- Number of masters: Linear (one demux path per master)
- Number of slaves: Linear (one extraction per slave)
- Data width: Linear (wider data = wider demux)
- FIFO depth: Linear (deeper = more BRAM)

## 2.8.10 Configuration Parameters

### Response Routing Configuration (TOML)

#### [bridge]

```
num_masters = 4
num_slaves = 3
```

#### [bridge.response\_routing]

```
enable_response_fifos = false      # Buffer responses per master
fifo_depth = 8                      # Entries per FIFO
response_arbiter_type = "round_robin" # "round_robin",
                                         "fixed_priority"
registered_demux = false           # Register demux output (+1 cycle)

# Per-master response credits (optional)
[[masters]]
name = "cpu"
max_response_credits = 16          # Outstanding responses allowed
```

## 2.8.11 Debug and Observability

### Recommended Debug Signals

BID Extraction:

- Slave RID/BID (from each slave)
- Extracted BID (master index)
- External RID/BID (to master, BID stripped)

Response Demux:

- Demux select signals (which master per slave)
- Response valid per master
- Response valid per slave

Arbitration:

- Conflict detection (multiple responses for same master)
- Arbiter grant (which slave wins)
- Stall counters

- FIFOs (if enabled):
- FIFO occupancy per master
  - FIFO full/empty flags
  - FIFO overflow errors

### **Performance Counters**

- Responses routed per master
- Arbitration conflicts (cycles with multiple responses to same master)
- Response stall cycles (master not ready)
- Average response latency per master
- FIFO overflow counts
- CAM hits/misses (if CAM enabled)

## **2.8.12 Common Issues and Debug**

**Symptom:** Response not reaching master

**Check:** - BID extraction (correct bit slicing?) - Master index (BID → master mapping) - Demux select logic - Backpressure (is master READY?)

**Symptom:** Response goes to wrong master

**Check:** - BID values (verify each master has unique BID) - BID width calculation (clog2 correct?) - CAM contents (if used) - ID corruption in transit

**Symptom:** Response ordering incorrect

**Check:** - CAM lookup (should preserve order) - Arbitration fairness (round-robin working?) - Multiple outstanding transactions per master

**Symptom:** Response loss

**Check:** - FIFO overflows (enable FIFOs if needed) - Dropped responses during arbitration conflicts - Protocol violations (slave not following AXI rules)

## **2.8.13 Verification Considerations**

### **Test Scenarios**

#### **1. Single Master, Single Slave:**

- Basic routing test
- Verify BID extraction and stripping
- Check response reaches correct master

#### **2. Multiple Masters, Single Slave:**

- Interleaved responses
- Verify each response routes to correct master
- Check BID uniqueness prevents collisions

### **3. Single Master, Multiple Slaves:**

- Simultaneous responses from multiple slaves
- Verify arbitration fairness
- Check no response loss

### **4. All Masters, All Slaves:**

- Maximum stress test
- All masters issuing to all slaves
- Responses returning in various orders
- Verify correct routing under heavy load

### **5. OOO Responses:**

- Issue transactions: ID=1, ID=2, ID=3
- Return responses: ID=3, ID=1, ID=2
- Verify CAM correctly routes each
- Check response order at master

### **6. Backpressure Test:**

- Master asserts RREADY=0
- Verify slave stalls (RREADY=0 propagated)
- Master asserts RREADY=1
- Verify responses flow

## **2.8.14 Performance Optimization**

### **Techniques**

#### **1. Registered Demux:**

Trade-off: +1 cycle latency for timing closure  
Best for: High-frequency designs, many masters

#### **2. Response FIFOs:**

Trade-off: +2KB BRAM, +1-2 cycle latency  
Benefit: Prevent response loss, smooth conflicts  
Best for: High-throughput systems

#### **3. Distributed Arbitration:**

Implementation: Per-master arbiters instead of global  
Benefit: Reduced logic depth, better timing  
Best for: >8 masters

#### **4. Priority Response Routing:**

Implementation: High-priority masters get preference in conflicts  
Benefit: Guaranteed low latency for critical masters  
Best for: Real-time systems

## 2.8.15 Advanced Features

### Response Reordering (Future)

Allow bridge to reorder responses for efficiency:

Scenario:

Master issues: Req A (slow slave), Req B (fast slave)  
Fast slave responds first  
Bridge can deliver B before A (if IDs different)

Benefit: Reduced latency for fast responses

Requirement: CAM to track outstanding in-order constraints

### Response Coalescing (Future)

Combine multiple responses into fewer beats:

Scenario:

Multiple single-beat reads to narrow slave  
Bridge coalesces into fewer wide beats

Benefit: Reduced overhead

Complexity: High (requires buffering and alignment)

### Response QoS (Future)

Prioritize responses based on master QoS:

Feature: High-QoS masters get response priority

Implementation: Weighted arbitration in response path

Use case: RT masters need bounded response latency

## 2.8.16 Timing Considerations

### Critical Paths

Common critical paths in response routing:

1. BID extraction → Demux select → Master RDATA  
Depth: ~8-12 logic levels
2. Slave RVALID → Arbitration → Master RVALID  
Depth: ~6-10 logic levels
3. Master RREADY → Backpressure → Slave RREADY  
Depth: ~5-8 logic levels

## Optimization Strategies

- Register demux outputs (+1 cycle, breaks path)
- Pipeline arbitration (+1-2 cycles, multi-stage)
- Use CAM for complex ID scenarios (parallel lookup)
- Limit response buffering depth (less MUX levels)

## 2.8.17 Future Enhancements

### Planned Features

- **Dynamic Response Buffering:** Adjust FIFO depth based on utilization
- **Response Ordering Hints:** Masters specify ordering requirements
- **Multi-Cycle Arbitration:** Pipelined for >16 masters
- **Response Compression:** Pack narrow responses into wide beats

### Under Consideration

- **Response Speculation:** Predictive routing before full ID available
- **Virtual Response Channels:** Separate paths for different QoS classes
- **Response Mirroring:** Duplicate responses for redundancy
- **Cross-Clock Response:** Async response paths with CDC

---

**Related Sections:** - Section 2.3: Crossbar Core (response path architecture) - Section 2.5: ID Management (BID extraction details) - Section 2.4: Arbitration (response arbitration algorithms) - Section 3.2: Master Port Interface (response channel signals)

## 2.9 Error Handling

Error Handling encompasses all mechanisms by which the bridge detects, reports, and recovers from error conditions. This includes protocol violations, out-of-range addresses, timeout conditions, and configuration errors.

### 2.9.1 Purpose and Function

Error handling performs the following critical functions:

1. **Error Detection:** Identifies violations and abnormal conditions
2. **Error Response Generation:** Creates appropriate AXI error responses
3. **Error Reporting:** Logs and signals errors for debug
4. **Graceful Degradation:** Maintains system stability during errors
5. **Recovery Support:** Enables system recovery after errors

## 2.9.2 Error Categories

### Transaction Errors

#### Out-of-Range (OOR) Addresses:

Master requests address not mapped to any slave

Response: DECERR (Decode Error)

Action: Bridge generates error response internally

#### Protocol Violations:

Master violates AXI4 protocol rules

Examples:

- VALID deasserted while READY=0
- Incorrect burst parameters
- ID width mismatches

Response: Configurable (error log, SLVERR, or ignore)

#### Slave Errors:

Slave returns SLVERR or DECERR response

Action: Bridge forwards error to master unchanged

### System Errors

#### Timeout Conditions:

Slave fails to respond within configured timeout

Response: DECERR after timeout expires

Action: Force transaction completion

#### CAM Overflow:

Too many outstanding transactions (CAM full)

Response: Backpressure master (ARREADY/AWREADY=0)

Action: Wait for responses to free entries

#### Configuration Errors:

Invalid bridge configuration detected at generation time

Examples:

- Overlapping slave address ranges
- Invalid data width combinations
- Illegal parameter values

Response: Generation error, fail early

## 2.9.3 Block Diagram



## Error Handling System

Figure 2.9: Error handling system showing detection (OOR, protocol, timeout, CAM), generation, and logging subsystems.

## 2.9.4 Out-of-Range Address Handling

### Detection

Router identifies OOR when no slave address range matches:

```
// Address decode with OOR detection
logic [NUM_SLAVES-1:0] slave_match;
logic oor_detected;

for (genvar i = 0; i < NUM_SLAVES; i++) begin
    assign slave_match[i] = (addr >= SLAVE_BASE[i]) &&
                           (addr <= SLAVE_END[i]);
end

assign oor_detected = ~(|slave_match) && ~default_slave_en;
```

### Error Response Generation

#### Read OOR Response:

```
// Generate R response for OOR read
always_ff @(posedge clk) begin
    if (ar_oor_detected && arvalid && arready) begin
```

```

    // Capture transaction details
    oor_arid <= arid;
    oor_arlen <= arlen;
    oor_burst_count <= 0;
    oor_generating <= 1'b1;
end else if (oor_generating) begin
    // Generate R beats
    rvalid <= 1'b1;
    rdata <= OOR_DATA_PATTERN; // Configurable pattern
    rid <= oor_arid;
    rresp <= 2'b11; // DECERR
    rlast <= (oor_burst_count == oor_arlen);

    if (rready) begin
        oor_burst_count <= oor_burst_count + 1;
        if (rlast) oor_generating <= 1'b0;
    end
end
end

```

### Write OOR Response:

```

// Generate B response for OOR write
always_ff @(posedge clk) begin
    if (aw_oor_detected && awvalid && awready) begin
        // Capture write ID
        oor_awid <= awid;
        oor_w_count <= 0;
        oor_sinking_w <= 1'b1;
    end else if (oor_sinking_w) begin
        // Sink W data (accept and discard)
        wready <= 1'b1;

        if (wvalid && wlast) begin
            oor_sinking_w <= 1'b0;
            oor_gen_bresp <= 1'b1;
        end
    end else if (oor_gen_bresp) begin
        // Generate B response
        bvalid <= 1'b1;
        bid <= oor_awid;
        bresp <= 2'b11; // DECERR

        if (bready) begin
            oor_gen_bresp <= 1'b0;
        end
    end
end

```

```

Configurable Data Patterns
[bridge.error_handling]
oor_read_data_pattern = 0xDEADCAFE # Pattern for OOR reads
oor_response_latency = 2          # Cycles to generate response

```

## 2.9.5 Protocol Violation Handling

### Common Violations

#### VALID Dropped While READY=0:

AXI4 Rule: Once VALID asserted, must remain until READY  
 Violation: Master deasserts VALID before handshake  
 Detection: Track VALID history per channel  
 Action: Log error, optionally assert error signal

#### Incorrect Burst Parameters:

Violations:  
 - SIZE > DATA\_WIDTH  
 - LEN > 255  
 - Address not aligned to SIZE  
 - Burst crosses 4KB boundary

Detection: Parameter checking logic  
 Action: SLVERR response or reject transaction

#### ID Width Mismatch:

Violation: Master uses ID wider than configured  
 Detection: Check ID values against expected width  
 Action: Truncate or error (configurable)

### Protocol Checker Implementation

```

// AXI4 protocol checker (simplified)
module axi_protocol_checker #(
    parameter ID_WIDTH = 4,
    parameter ADDR_WIDTH = 32,
    parameter DATA_WIDTH = 64
) (
    input logic clk, rst_n,
    // AXI signals
    input logic arvalid, arready,
    input logic [ADDR_WIDTH-1:0] araddr,
    input logic [7:0] arlen,
    input logic [2:0] arsize,
    // Error outputs
    output logic protocol_error,
    output logic [7:0] error_code
);

```

```

// Check: VALID held until READY
logic arvalid_prev;
always_ff @(posedge clk) begin
    arvalid_prev <= arvalid;
    if (arvalid_prev && !arvalid && !arready) begin
        protocol_error <= 1'b1;
        error_code <= 8'h01; // VALID_DROPPED
    end
end

// Check: SIZE <= DATA_WIDTH
always_comb begin
    if (arvalid && (arsize > $clog2(DATA_WIDTH/8))) begin
        protocol_error = 1'b1;
        error_code = 8'h02; // INVALID_SIZE
    end
end

// Check: Address alignment
always_comb begin
    if (arvalid && (araddr[arsize-1:0] != 0)) begin
        protocol_error = 1'b1;
        error_code = 8'h03; // UNALIGNED_ADDR
    end
end

// Additional checks...
endmodule

```

## 2.9.6 Timeout Detection

### Per-Transaction Watchdog

```

// Timeout detector for outstanding transactions
typedef struct packed {
    logic valid;
    logic [TOTAL_ID_WIDTH-1:0] id;
    logic [15:0] timestamp;
    logic is_read; // 1=read, 0=write
} timeout_entry_t;

timeout_entry_t watchdog [0:15]; // Track up to 16 transactions
logic [15:0] global_timer;

always_ff @(posedge clk) begin
    if (!rst_n) begin
        global_timer <= 0;
    end else begin
        global_timer <= global_timer + 1;
    end
end

```

```

    // Check for timeouts
    for (int i = 0; i < 16; i++) begin
        if (watchdog[i].valid) begin
            logic [15:0] elapsed;
            elapsed = global_timer - watchdog[i].timestamp;

            if (elapsed > TIMEOUT_THRESHOLD) begin
                // Timeout detected
                timeout_error <= 1'b1;
                timeout_id <= watchdog[i].id;
                timeout_type <= watchdog[i].is_read ? "READ" :
                "WRITE";
            end
        end
    end

```

### Timeout Configuration

```

[bridge.error_handling]
enable_timeout = true
timeout_cycles = 10000          # Max cycles before timeout
timeout_action = "decerr"       # "decerr", "slverr", or "hang"
per_slave_timeout = [           # Optional per-slave overrides
    {name = "slow_periph", timeout = 50000},
    {name = "fast_mem", timeout = 1000}
]

```

## 2.9.7 Error Logging and Reporting

### Error Status Register

Error Status Register (Read/Clear) :

| Bits  | Description                    |
|-------|--------------------------------|
| 0     | 00R Read Error                 |
| 1     | 00R Write Error                |
| 2     | Protocol Violation             |
| 3     | Timeout Error                  |
| 4     | CAM Overflow                   |
| 5     | Slave Error (SLVERR received)  |
| 6     | Decode Error (DECERR received) |
| 7     | ID Match Error                 |
| 15:8  | Reserved                       |
| 31:16 | Error Count (saturating)       |

```

Error History Buffer
// Circular buffer for error history
typedef struct packed {
    logic [7:0] error_type;
    logic [31:0] address;
    logic [TOTAL_ID_WIDTH-1:0] id;
    logic [31:0] timestamp;
    logic [7:0] master_index;
    logic [7:0] slave_index;
} error_entry_t;

error_entry_t error_log [0:15]; // 16-entry history
logic [3:0] error_wr_ptr;

always_ff @(posedge clk) begin
    if (error_detected) begin
        error_log[error_wr_ptr] <= {
            error_type,
            error_addr,
            error_id,
            global_timer,
            error_master,
            error_slave
        };
        error_wr_ptr <= error_wr_ptr + 1;
    end
end

```

### Error Interrupts

Optional interrupt generation:

Interrupt Enable Register:

- OOR\_INT\_EN
- TIMEOUT\_INT\_EN
- PROTOCOL\_INT\_EN
- etc.

Interrupt when enabled error occurs

Clear on status register read or explicit clear

## 2.9.8 Resource Utilization

### Error Handling Resources

Logic Elements: ~800-1200 LEs

Registers: ~400-600 regs

Block RAM: 0-2 KB (if error logging enabled)

Breakdown:

- OOR detection/response: ~300 LEs, ~100 regs
- Protocol checkers: ~250 LEs, ~50 regs
- Timeout watchers: ~200 LEs, ~150 regs
- Error logging: ~150 LEs, ~100 regs
- Status registers: ~100 LEs, ~50 regs

Optional error log (16 entries): +2KB BRAM

## 2.9.9 Configuration Parameters

### Error Handling Configuration (TOML)

```
[bridge.error_handling]
# OOR handling
enable_oor_errors = true
oor_response_type = "decerr"      # "decerr", "slverr"
oor_read_pattern = 0xDEADCAFE    # Data pattern for OOR reads

# Timeout detection
enable_timeout = true
timeout_cycles = 10000
timeout_response = "decerr"

# Protocol checking
enable_protocol_checker = true
protocol_checker_mode = "log"      # "log", "error", "ignore"

# Error logging
enable_error_log = true
error_log_depth = 16                # Entries in circular buffer

# Error reporting
enable_error_interrupts = false
error_status_register_addr = 0xFFFF_FFF0 # Optional mapped register
```

## 2.9.10 Debug and Observability

### Recommended Debug Signals

#### Error Detection:

- OOR flags (per master)
- Protocol violation flags
- Timeout counters
- CAM occupancy

#### Error Response:

- Error response generation FSM states
- Active error responses
- Error data patterns

#### Error Logging:

- Error count
- Last error type
- Last error address
- Error log write pointer

## 2.9.11 Common Issues and Debug

**Symptom:** Unexpected DECERR responses

**Check:** - Address map configuration (gaps?) - Slave address ranges (overlaps?) - Default slave configuration - Timeout thresholds (too short?)

**Symptom:** System hangs on certain addresses

**Check:** - Timeout detection enabled? - Slave responsiveness - Protocol violations upstream - CAM overflow

**Symptom:** Error log filling quickly

**Check:** - What errors are occurring? - Master behavior (bad addresses?) - Slave configuration - Timeout thresholds

## 2.9.12 Verification Considerations

### Test Scenarios

#### 1. OOR Address Tests:

- Read from unmapped address
- Write to unmapped address
- Burst to partially mapped range
- Verify DECERR response

#### 2. Timeout Tests:

- Slave never responds
- Verify timeout after N cycles
- Check error logged
- Verify forced completion

#### 3. Protocol Violation Tests:

- Drop VALID before READY
- Invalid burst parameters
- Misaligned addresses
- Verify detection and response

#### 4. Error Recovery Tests:

- Error occurs
- System continues operating
- Subsequent transactions succeed
- Error status readable

## 5. Error Logging Tests:

- Generate multiple errors
- Verify all logged
- Check circular buffer wrap
- Verify timestamps

## 2.9.13 Recovery Mechanisms

### Automatic Recovery

#### 00R Errors:

- Generate error response
- Continue normal operation
- No intervention needed

#### Timeout Errors:

- Force transaction completion
- Free CAM entry
- Allow new transactions

#### Protocol Violations:

- Log error
- Complete/reject transaction
- Continue with next transaction

### Manual Recovery

#### CAM Stuck:

- Software reset via control register
- Clear CAM entries
- Restart affected masters

#### Persistent Errors:

- Read error log
- Identify root cause
- Reconfigure or reset subsystem

## 2.9.14 Safety and Reliability

### Error Containment

Goal: Prevent error propagation

#### Mechanisms:

- Isolate error to originating master
- Don't corrupt other masters' transactions
- Maintain crossbar integrity
- Log for post-mortem analysis

## **Fail-Safe Behavior**

On Critical Error:

1. Generate safe error response (DECERR)
2. Log error details
3. Assert error signal (optional)
4. Continue operation if possible
5. Halt only if configured (safety-critical mode)

## **Error Injection (Debug/Test)**

[bridge.debug]

```
enable_error_injection = true
```

```
[[bridge.debug.error_injection]]  
type = "oor"  
trigger_address = 0xBAD_ADDR  
action = "decerr"
```

```
[[bridge.debug.error_injection]]  
type = "timeout"  
trigger_after_n_cycles = 100  
target_slave = 2
```

## **2.9.15 Future Enhancements**

### **Planned Features**

- **Error Recovery FSM:** Automatic recovery from certain error conditions
- **Error Rate Limiting:** Prevent error storm from misbehaving master
- **Detailed Error Telemetry:** Capture full transaction context on error
- **Error Prediction:** Detect patterns indicating impending failures

### **Under Consideration**

- **ECC Protection:** Error correction for internal state
- **Redundant Checking:** Duplicate checkers for safety-critical
- **Watchdog Timers:** System-level health monitoring
- **Error Severity Levels:** Classify errors by impact

---

**Related Sections:** - Section 2.2: Slave Router (OOR detection) - Section 2.5: ID Management (CAM overflow) - Section 2.8: Response Routing (error response paths) - Chapter 5: Verification (error testing strategies)

## 3.1 AXI4 Signal Reference

This chapter provides a comprehensive reference for all AXI4 signals used in the bridge crossbar, including timing requirements, protocol rules, and common issues.

### 3.1.1 Overview

The AXI4 protocol consists of five independent channels: - **Write Address (AW)** - Master initiates write transaction - **Write Data (W)** - Master provides write data - **Write Response (B)** - Slave acknowledges write completion - **Read Address (AR)** - Master initiates read transaction - **Read Data (R)** - Slave returns read data

Each channel uses a **valid/ready** handshake mechanism for flow control.

### 3.1.2 Clock and Reset

#### Global Signals

```
input logic aclk;      // AXI4 clock - all signals sampled on rising edge
input logic aresetn;   // AXI4 reset - active low, asynchronous assert, synchronous deassert
```

**Reset Behavior:** - When aresetn is LOW, all interfaces are held in reset - All VALID signals must be driven LOW during reset - READY signals may be driven LOW or held at any value during reset - After aresetn goes HIGH, normal operation begins on the next rising clock edge

**Timing Requirements:** - Setup time: All AXI4 signals must be stable before rising edge of aclk - Hold time: All AXI4 signals must remain stable after rising edge of aclk - Reset assertion: aresetn can be asserted asynchronously - Reset deassertion: aresetn must be deasserted synchronously (stable before rising edge)

### 3.1.3 Write Address Channel (AW)

#### Purpose

The Write Address channel carries all information required to describe a write transaction, except the data itself.

#### Signals

```
// Write Address Channel (Master → Slave)
output logic awvalid;    // Write address valid
input  logic awready;    // Write address ready
```

```

output logic [ID_WIDTH-1:0] awid; // Write address ID
output logic [ADDR_WIDTH-1:0] awaddr; // Write address
output logic [7:0] awlen; // Burst length (0-255)
output logic [2:0] awszie; // Burst size (2^awszie
    bytes per beat)
output logic [1:0] awburst; // Burst type
output logic awlock; // Lock type
output logic [3:0] awcache; // Cache attributes
output logic [2:0] awprot; // Protection attributes
output logic [3:0] awqos; // Quality of Service
output logic [3:0] awregion; // Region identifier
output logic [USER_WIDTH-1:0] awuser; // User-defined sideband
    (optional)

```

## Signal Descriptions

### *awvalid*

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Indicates write address channel signals are valid
- **Protocol Rules:**
  - Master asserts when address phase information is valid
  - Must remain asserted until awready is HIGH on rising edge
  - Once asserted, cannot be deasserted until handshake completes
  - Must be LOW during reset

### *awready*

- **Direction:** Slave → Master
- **Width:** 1 bit
- **Description:** Indicates slave is ready to accept address
- **Protocol Rules:**
  - Slave asserts when ready to accept address information
  - Can be asserted before, during, or after awvalid is asserted
  - Recommended to assert before awvalid for zero-wait-state transfers
  - Can toggle freely (no requirement to wait for awvalid)

### *awid*

- **Direction:** Master → Slave
- **Width:** ID\_WIDTH bits (typically 1-16)
- **Description:** Write transaction identifier
- **Protocol Rules:**

- Uniquely identifies write transactions from a master
- Transactions with different IDs can complete out-of-order
- Transactions with same ID must complete in order
- Bridge may append additional bits for multi-master routing
- Must remain stable while awvalid is asserted

### *awaddr*

- **Direction:** Master → Slave
- **Width:** ADDR\_WIDTH bits (typically 32 or 64)
- **Description:** Starting address of write burst
- **Protocol Rules:**
  - Byte address of the first beat in the burst
  - Must be aligned to awsize ( $\text{awaddr}[\text{awsize}-1:0] == 0$ )
  - Subsequent beat addresses calculated using awburst type
  - Must remain stable while awvalid is asserted
  - Bridge router uses bits to select target slave

### **Alignment Examples:**

```
awslice = 3'b000 (1 byte): awaddr[ 0:0] must be 0 (always aligned)
awslice = 3'b001 (2 bytes): awaddr[ 1:0] must be 0
awslice = 3'b010 (4 bytes): awaddr[ 2:0] must be 0
awslice = 3'b011 (8 bytes): awaddr[ 3:0] must be 0
awslice = 3'b100 (16 bytes): awaddr[ 4:0] must be 0
```

### *awlen*

- **Direction:** Master → Slave
- **Width:** 8 bits
- **Description:** Number of data beats in burst (actual beats = awlen + 1)
- **Protocol Rules:**
  - Valid range: 0 to 255 (1 to 256 beats)
  - awlen = 0 → 1 beat (single transfer)
  - awlen = 1 → 2 beats
  - awlen = 255 → 256 beats (maximum)
  - Must remain stable while awvalid is asserted

### **Common Burst Lengths:**

```
awlen = 0    → 1 beat   (single access)
awlen = 1    → 2 beats
awlen = 3    → 4 beats (common for line fill)
```

```
awlen = 7 → 8 beats  
awlen = 15 → 16 beats (common for cache line)
```

#### *awsiz*

- **Direction:** Master → Slave
- **Width:** 3 bits
- **Description:** Size of each beat in bytes ( $2^{\text{awsiz}}$ )
- **Protocol Rules:**
  - Valid range: 0 to  $\log_2(\text{DATA\_WIDTH}/8)$
  - Cannot exceed data bus width
  - All beats in burst use same size
  - Must remain stable while awvalid is asserted

#### **Size Encoding:**

```
awsiz = 3'b000 → 1 byte      (2^0)  
awsiz = 3'b001 → 2 bytes    (2^1)  
awsiz = 3'b010 → 4 bytes    (2^2)  
awsiz = 3'b011 → 8 bytes    (2^3)  
awsiz = 3'b100 → 16 bytes   (2^4)  
awsiz = 3'b101 → 32 bytes   (2^5)  
awsiz = 3'b110 → 64 bytes   (2^6)  
awsiz = 3'b111 → 128 bytes  (2^7)
```

#### **Data Width Constraints:**

```
32-bit bus: awsize ≤ 3'b010 (max 4 bytes)  
64-bit bus: awsize ≤ 3'b011 (max 8 bytes)  
128-bit bus: awsize ≤ 3'b100 (max 16 bytes)  
256-bit bus: awsize ≤ 3'b101 (max 32 bytes)
```

#### *awburst*

- **Direction:** Master → Slave
- **Width:** 2 bits
- **Description:** Burst type - how addresses increment
- **Protocol Rules:**
  - Controls address calculation for multi-beat bursts
  - Must remain stable while awvalid is asserted

#### **Burst Type Encoding:**

|                          |                                                              |
|--------------------------|--------------------------------------------------------------|
| awburst = 2'b00 (FIXED): | Address stays constant for all beats<br>Used for FIFO access |
| awburst = 2'b01 (INCR):  | Address increments by awsiz each beat                        |

|                                     |                                                                                                                                        |
|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
|                                     | Most common burst type<br>No boundary restrictions                                                                                     |
| awburst = 2'b10 (WRAP):<br>boundary | Address increments then wraps at<br>Boundary = awsize × (awlen+1)<br>Used for cache line access<br>Burst length must be 2, 4, 8, or 16 |
| awburst = 2'b11 (RESERVED):         | Not used in AXI4                                                                                                                       |

### Address Calculation Examples:

*INCR Burst (awburst=01):*

```
awaddr = 0x1000, awsize = 2 (4 bytes), awlen = 3 (4 beats)
Beat 0: 0x1000
Beat 1: 0x1004
Beat 2: 0x1008
Beat 3: 0x100C
```

*WRAP Burst (awburst=10):*

```
awaddr = 0x100C, awsize = 2 (4 bytes), awlen = 3 (4 beats)
Boundary = 4 × 4 = 16 bytes (wraps at 0x1010)
Beat 0: 0x100C
Beat 1: 0x1010 → wraps to 0x1000
Beat 2: 0x1004
Beat 3: 0x1008
```

#### *awlock*

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Atomic access (lock) indicator
- **Protocol Rules:**
  - 0 = Normal access
  - 1 = Exclusive access (atomic operation)
  - Bridge typically does NOT implement lock support (passes through)
  - Must remain stable while awvalid is asserted

#### *awcache*

- **Direction:** Master → Slave
- **Width:** 4 bits
- **Description:** Memory type and cacheable attributes
- **Protocol Rules:**

- Provides hints about caching behavior
- Bridge typically passes through unchanged
- Must remain stable while awvalid is asserted

### **Cache Encoding (Common Values):**

```
awcache[0] - Bufferable:      Can delay/combine with other writes
awcache[1] - Cacheable:       Can be cached
awcache[2] - Read Allocate:   Read miss allocates cache line
awcache[3] - Write Allocate:  Write miss allocates cache line
```

Common combinations:

```
4'b0000 - Device Non-bufferable (strict ordering, no caching)
4'b0001 - Device Bufferable (can buffer, no caching)
4'b0011 - Normal Non-cacheable Bufferable
4'b1111 - Write-back Read and Write allocate (full caching)
```

### *awprot*

- **Direction:** Master → Slave
- **Width:** 3 bits
- **Description:** Protection attributes
- **Protocol Rules:**
  - Used for security and privilege checking
  - Bridge typically passes through unchanged
  - Must remain stable while awvalid is asserted

### **Protection Encoding:**

```
awprot[0] - Privileged: 0 = Unprivileged, 1 = Privileged access
awprot[1] - Secure:     0 = Secure, 1 = Non-secure access
awprot[2] - Instruction: 0 = Data access, 1 = Instruction access
```

Examples:

```
3'b000 - Unprivileged, Secure, Data
3'b001 - Privileged, Secure, Data
3'b010 - Unprivileged, Non-secure, Data
3'b110 - Unprivileged, Non-secure, Instruction
```

### *awqos*

- **Direction:** Master → Slave
- **Width:** 4 bits
- **Description:** Quality of Service identifier
- **Protocol Rules:**
  - Higher value = higher priority (typically)

- 0 = default/no QoS requirements
- Bridge arbiter MAY use for priority decisions
- Must remain stable while awvalid is asserted

### *awregion*

- **Direction:** Master → Slave
- **Width:** 4 bits
- **Description:** Region identifier (up to 16 regions)
- **Protocol Rules:**
  - Used to access multiple logical address spaces
  - Bridge typically ignores (uses only awaddr)
  - Must remain stable while awvalid is asserted

### *awuser*

- **Direction:** Master → Slave
- **Width:** USER\_WIDTH bits (optional, configurable)
- **Description:** User-defined sideband signaling
- **Protocol Rules:**
  - User-defined extension to protocol
  - Bridge may pass through or use for custom features
  - Must remain stable while awvalid is asserted
  - Not all designs include this signal

## Handshake Timing

### Single-Cycle Transfer:



Transfer completes when awvalid && awready on rising edge.

### Wait States (Master):





Master asserts awvalid and waits 2 cycles for awready.

### Wait States (Slave):



Slave asserts awready and waits 2 cycles for awvalid.

### 4KB Boundary Rule

**Critical AXI4 Rule:** A burst must NOT cross a 4KB address boundary.

**Why 4KB?** - Page size in most memory management units - Prevents single burst from spanning multiple pages - Critical for virtual memory systems

### Checking for Violation:

```
// Calculate burst span
logic [ADDR_WIDTH-1:0] burst_span;
logic [11:0] upper_addr_bits;
logic [11:0] end_addr_bits;

burst_span = (awlen + 1) << awsize; // Total bytes in burst
upper_addr_bits = awaddr[ADDR_WIDTH-1:12];
end_addr_bits = (awaddr + burst_span - 1) >> 12;

// Violation if page boundary crossed
logic boundaryViolation;
assign boundaryViolation = (upper_addr_bits != end_addr_bits);
```

### Example Violations:

VALID:

awaddr = 0x0000\_0FF0, awlen = 3 (4 beats), awsize = 2 (4 bytes)  
 End = 0x0000\_0FF0 + 16 - 1 = 0x0000\_0FFF  
 Both in page 0x0000\_0xxx ✓

```
INVALID:  
awaddr = 0x0000_0FF0, awlen = 7 (8 beats), awsize = 2 (4 bytes)  
End = 0x0000_0FF0 + 32 - 1 = 0x0000_100F  
Spans pages 0x0000_0xxx and 0x0000_1xxx x
```

## Common Issues and Debug

### Issue 1: awvalid dropped before handshake

Symptom: Master deasserts awvalid before awready seen  
Check:

- awvalid must stay asserted until awready HIGH
- Master FSM logic

### Issue 2: Unaligned address

Symptom: awaddr not aligned to awsize  
Check:

- awaddr[awsize-1:0] must be zero
- Master address generation logic

### Issue 3: awsize exceeds bus width

Symptom: awsize = 4 (16 bytes) on 64-bit (8-byte) bus  
Check:

- awsize must be  $\leq \log_2(\text{DATA\_WIDTH}/8)$
- Master configuration

### Issue 4: 4KB boundary violation

Symptom: Burst crosses page boundary  
Check:

- Calculate end address: start + (len+1)×size
- Check if upper address bits change

### Issue 5: Invalid WRAP burst

Symptom: awburst=WRAP with illegal length

Check:

- WRAP bursts must have len  $\in \{1, 3, 7, 15\}$  (2/4/8/16 beats)
- awaddr must be aligned to burst span

## Bridge-Specific Behavior

### ID Width Extension:

Bridge may append bits to awid for routing:

Master awid: [ID\_WIDTH-1:0]

Bridge awid: [TOTAL\_ID\_WIDTH-1:0] = {bridge\_id, master\_awid}

This allows bridge to route responses back to correct master.

## Address Decoding:

```
Bridge router examines awaddr to select slave:  
if (awaddr >= SLAVE_BASE[i] && awaddr < SLAVE_END[i])  
    route to slave i  
else  
    generate DECERR response (address decode error)
```

## Arbitration:

When multiple masters request same slave:

- Bridge arbiter selects one master (round-robin, priority, etc.)
  - Winning master's AW channel connects to slave
  - Losing masters see awready = 0 (backpressure)
- 

**Related Sections:** - Section 3.2: Write Data Channel (W) - Section 3.3: Write Response Channel (B) - Section 2.1: Master Adapter (ID generation) - Section 2.2: Slave Router (address decoding) - Section 2.4: Arbitration

**Next:** [3.2 Write Data Channel \(W\)](#)

## 3.2 Write Data Channel (W)

The Write Data channel carries the actual write data from master to slave. Unlike the Write Address channel, the W channel does NOT carry an ID, so the bridge must maintain ordering between AW and W channels.

### 3.2.1 Overview

**Key Characteristics:** - Carries write data beats for a write transaction - No ID field (unlike other channels) - Must follow AW channel in order for same transaction - Can be interleaved with W beats from other transactions (different AWIDs) - Uses valid/ready handshake per beat

### 3.2.2 Channel Signals

```
// Write Data Channel (Master → Slave)  
output logic          wvalid;      // Write data valid  
input  logic          wready;      // Write data ready  
output logic [DATA_WIDTH-1:0] wdata;      // Write data  
output logic [STRB_WIDTH-1:0] wstrb;     // Write strobes (byte  
enables)  
output logic          wlast;       // Last beat of burst  
output logic [USER_WIDTH-1:0] wuser;      // User-defined sideband  
(optional)
```

Where:

STRB\_WIDTH = DATA\_WIDTH / 8 (one strobe bit per byte)

### 3.2.3 Signal Descriptions

#### wvalid

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Indicates write data and strobes are valid
- **Protocol Rules:**
  - Master asserts when W channel information is valid
  - Must remain asserted until wready is HIGH on rising edge
  - Once asserted, cannot be deasserted until handshake completes
  - Must be LOW during reset
  - Multiple W beats can be pipelined (different transactions)

#### wready

- **Direction:** Slave → Master
- **Width:** 1 bit
- **Description:** Indicates slave is ready to accept write data
- **Protocol Rules:**
  - Slave asserts when ready to accept data
  - Can toggle freely (no dependency on wvalid)
  - Recommended to assert early for throughput
  - Can be LOW during reset or backpressure

#### wdata

- **Direction:** Master → Slave
- **Width:** DATA\_WIDTH bits (typically 32, 64, 128, 256, 512, or 1024)
- **Description:** Write data for current beat
- **Protocol Rules:**
  - Contains data to be written for this beat
  - Only bytes with corresponding wstrb bit set are written
  - Bytes with wstrb=0 can be any value (don't care)
  - Must remain stable while wvalid is asserted
  - Byte order: Little-endian within the bus width

#### Data Bus Layout (64-bit example):

```
wdata[63:56] - Byte 7  (highest address)
wdata[55:48] - Byte 6
wdata[47:40] - Byte 5
wdata[39:32] - Byte 4
wdata[31:24] - Byte 3
wdata[23:16] - Byte 2
wdata[15: 8] - Byte 1
wdata[ 7: 0] - Byte 0  (lowest address)
```

## wstrb

- **Direction:** Master → Slave
- **Width:** STRB\_WIDTH bits = DATA\_WIDTH/8 (one bit per data byte)
- **Description:** Byte-level write enables
- **Protocol Rules:**
  - Each bit enables writing of corresponding data byte
  - wstrb[n] = 1: Write wdata[8n+7:8n]
  - wstrb[n] = 0: Do NOT write wdata[8n+7:8n] (leave memory unchanged)
  - Must remain stable while wvalid is asserted
  - For narrow transfers (awsize < bus width), only relevant strobes are set

## Strobe Examples (64-bit bus):

```
wstrb = 8'b1111_1111 - Write all 8 bytes (full 64-bit write)
wstrb = 8'b0000_1111 - Write bytes 0-3 only (lower 32 bits)
wstrb = 8'b1111_0000 - Write bytes 4-7 only (upper 32 bits)
wstrb = 8'b0000_0001 - Write byte 0 only (8-bit write to lowest
address)
wstrb = 8'b1010_1010 - Write alternating bytes (0, 2, 4, 6)
```

## Strobe Patterns Based on awsize:

For a 64-bit bus (8 bytes), different awsize values generate different strobe patterns:

```
awsize = 3'b000 (1 byte):
awaddr[2:0] = 0 → wstrb = 8'b0000_0001
awaddr[2:0] = 1 → wstrb = 8'b0000_0010
awaddr[2:0] = 2 → wstrb = 8'b0000_0100
...
awaddr[2:0] = 7 → wstrb = 8'b1000_0000

awsize = 3'b001 (2 bytes):
awaddr[2:1] = 0 → wstrb = 8'b0000_0011
awaddr[2:1] = 1 → wstrb = 8'b0000_1100
```

```

awaddr[2:1] = 2 → wstrb = 8'b0011_0000
awaddr[2:1] = 3 → wstrb = 8'b1100_0000

awszie = 3'b010 (4 bytes):
awaddr[2] = 0 → wstrb = 8'b0000_1111
awaddr[2] = 1 → wstrb = 8'b1111_0000

awszie = 3'b011 (8 bytes):
wstrb = 8'b1111_1111 (all bytes)

```

### wlast

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Indicates last data beat in write burst
- **Protocol Rules:**
  - Asserted on final W beat of transaction
  - Beat count must match awlen + 1 from corresponding AW
  - Only ONE W beat in a burst has wlast=1
  - Must remain stable while wvalid is asserted
  - Critical for burst boundary detection

### wlast Timing:

Transaction with awlen=3 (4 beats total):

```

Beat 0: wvalid=1, wlast=0 → First beat
Beat 1: wvalid=1, wlast=0 → Middle beat
Beat 2: wvalid=1, wlast=0 → Middle beat
Beat 3: wvalid=1, wlast=1 → LAST beat (triggers B response)

```

Single beat (awlen=0):

Beat 0: wvalid=1, wlast=1 → First AND last

### wuser

- **Direction:** Master → Slave
- **Width:** USER\_WIDTH bits (optional, configurable)
- **Description:** User-defined sideband signaling
- **Protocol Rules:**
  - User-defined extension to protocol
  - Bridge may pass through or use for custom features
  - Must remain stable while wvalid is asserted
  - Not all designs include this signal

### 3.2.4 Handshake Timing

#### Single Beat Transfer:



#### Burst Transfer (4 Beats):



#### Back Pressure (wready=0):



Beat 0 transferred cycle 1  
Beat 1 stalled cycles 2-4 (wready=0)  
Beat 1 transferred cycle 5

### 3.2.5 Relationship to AW Channel

**Critical Rule:** W channel data must correspond to AW channel addresses in order.

#### AW-W Ordering

Master sends:

1. AW transaction (awid=5, awlen=3) → 4 W beats required
2. W beats (4 beats total)
3. AW transaction (awid=7, awlen=1) → 2 W beats required
4. W beats (2 beats total)

The W beats MUST match AW transactions in FIFO order:

First 4 W beats → awid=5 transaction  
Next 2 W beats → awid=7 transaction

#### W Interleaving (AXI4 Does NOT Allow)

**IMPORTANT:** AXI4 does NOT allow W channel interleaving.

ILLEGAL in AXI4:

```
AW: awid=5, awlen=3 (needs 4 W beats)
AW: awid=7, awlen=1 (needs 2 W beats)
W: beat for awid=5
W: beat for awid=7 ← ILLEGAL! Must finish awid=5 first
W: beat for awid=5
...

```

LEGAL in AXI4:

```
AW: awid=5, awlen=3
AW: awid=7, awlen=1
W: beat 0 for awid=5
W: beat 1 for awid=5
W: beat 2 for awid=5
W: beat 3 for awid=5 (wlast=1)
W: beat 0 for awid=7
W: beat 1 for awid=7 (wlast=1)
```

**Why No Interleaving?** - W channel has no ID field - Slave must match W beats to AW in order - Simplifies slave implementation - Note: AW channel CAN interleave (different AWIDs can go out of order)

#### Bridge W Channel Behavior

Bridge must maintain ALL master's W channel ordering:

- Cannot interleave W from different masters
- Must serialize W channels from multiple masters
- Arbiter grants AW, then W must follow
- Next master's W cannot start until previous W completes (wlast seen)

Example Timeline:

Cycle 0: Master0 AW granted (awid=M0\_5, awlen=1)  
Cycle 1: Master0 W beat 0  
Cycle 2: Master0 W beat 1 (wlast=1)  
Cycle 3: Master1 AW granted (awid=M1\_3, awlen=0)  
Cycle 4: Master1 W beat 0 (wlast=1)

^ ^ ^

Master1 W cannot start before Master0 wlast

### 3.2.6 Burst Length Matching

**Rule:** Number of W beats must match awlen + 1 from corresponding AW.

#### Verification Check

```
// Track W beat count per AW transaction
logic [7:0] expected_w_beats; // From awlen + 1
logic [7:0] actual_w_beats; // Counter

always_ff @(posedge aclk) begin
    if (!aresetn) begin
        expected_w_beats <= 0;
        actual_w_beats <= 0;
    end else begin
        // Capture expected count when AW transfers
        if (awvalid && awready) begin
            expected_w_beats <= awlen + 1;
            actual_w_beats <= 0;
        end

        // Count W beats
        if (wvalid && wready) begin
            actual_w_beats <= actual_w_beats + 1;

            // Check on last beat
            if (wlast) begin
                assert(actual_w_beats + 1 == expected_w_beats)
                    else $error("W beat count mismatch!");
            end
        end
    end
end
```

#### Common Errors

##### Error 1: Too few W beats

awlen = 3 (expect 4 beats)  
W: beat 0

```
W: beat 1
W: beat 2 (wlast=1) ← ERROR! wlast asserted after only 3 beats
```

### Error 2: Too many W beats

```
awlen = 1 (expect 2 beats)
W: beat 0
W: beat 1 (wlast=1)
W: beat 2 ← ERROR! Extra beat after wlast
```

### Error 3: wlast missing

```
awlen = 2 (expect 3 beats)
W: beat 0 (wlast=0)
W: beat 1 (wlast=0)
W: beat 2 (wlast=0) ← ERROR! wlast never asserted
```

## 3.2.7 Data Width Conversion

When masters and slaves have different data widths, the bridge may insert width converters.

### Upsizing (Narrow Master → Wide Slave)

#### Example: 32-bit master writing to 64-bit slave

Master sends:

```
awaddr = 0x1000, awsize = 2 (4 bytes), awlen = 3 (4 beats)
W beat 0: wdata[31:0] = 0xAAAA_AAAA, wstrb = 4'b1111
W beat 1: wdata[31:0] = 0xB BBBB_BBBB, wstrb = 4'b1111
W beat 2: wdata[31:0] = 0xCC CCCC_CCCC, wstrb = 4'b1111
W beat 3: wdata[31:0] = 0xDD DDDD_DDDD, wstrb = 4'b1111
```

Converter packs → Slave sees:

```
awaddr = 0x1000, awsize = 3 (8 bytes), awlen = 1 (2 beats)
W beat 0: wdata[63:0] = 0xB BBBB_BBBB_AAAA_AAAA, wstrb = 8'b1111_1111
W beat 1: wdata[63:0] = 0xD DDDD_DDDD_CCCC_CCCC, wstrb = 8'b1111_1111
```

### Downsizing (Wide Master → Narrow Slave)

#### Example: 64-bit master writing to 32-bit slave

Master sends:

```
awaddr = 0x1000, awsize = 3 (8 bytes), awlen = 1 (2 beats)
W beat 0: wdata[63:0] = 0xB BBBB_BBBB_AAAA_AAAA, wstrb = 8'b1111_1111
W beat 1: wdata[63:0] = 0xD DDDD_DDDD_CCCC_CCCC, wstrb = 8'b1111_1111
```

Converter splits → Slave sees:

```

awaddr = 0x1000, awsize = 2 (4 bytes), awlen = 3 (4 beats)
W beat 0: wdata[31:0] = 0xAAAA_AAAA, wstrb = 4'b1111
W beat 1: wdata[31:0] = 0xB BBBB_BBBB, wstrb = 4'b1111
W beat 2: wdata[31:0] = 0xCC CCCC_CCCC, wstrb = 4'b1111
W beat 3: wdata[31:0] = 0xDD DDDD_DDDD, wstrb = 4'b1111

```

See Section 2.6 for full width conversion details.

### 3.2.8 Write Strobes and Partial Writes

#### Strobe Calculation

Based on awaddr, awsize, and awlen, calculate which bytes are written:

```

function automatic logic [STRB_WIDTH-1:0] calc_wstrb(
    input logic [ADDR_WIDTH-1:0] addr,
    input logic [2:0] size,
    input int beat_number
);
    logic [ADDR_WIDTH-1:0] beat_addr;
    logic [STRB_WIDTH-1:0] strb;
    int byte_offset;
    int num_bytes;

    // Calculate address for this beat
    beat_addr = addr + (beat_number << size);

    // Number of bytes in transfer
    num_bytes = 1 << size;

    // Byte offset within data bus
    byte_offset = beat_addr % STRB_WIDTH;

    // Generate strobe
    strb = ((1 << num_bytes) - 1) << byte_offset;

    return strb;
endfunction

```

#### Sparse Writes

Strobes allow byte-granular writes:

Write 0xAB to address 0x1002 on a 32-bit bus:

```

awaddr = 0x1000 (aligned to word)
awszie = 0 (1 byte)
awlen = 0 (1 beat)

```

Address within word: 0x1002 - 0x1000 = byte 2

```
wdata = 32'hXX_AB_XX_XX (only byte 2 matters)
wstrb = 4'b0100 (only enable byte 2)

Memory at 0x1000: [byte3][byte2][byte1][byte0]
After write:      [ - ][ 0xAB][ - ][ - ]
                           ^^^
                           Only this byte written
```

### Full vs Partial Strobes

Full Write (all strobes set):

```
wstrb = 8'b1111_1111 → All bytes written
Simple for slave (no read-modify-write)
```

Partial Write (some strobes clear):

```
wstrb = 8'b0000_1111 → Lower 4 bytes written
Slave must preserve upper 4 bytes unchanged
May require read-modify-write in some slaves
```

## 3.2.9 Common Issues and Debug

### Issue 1: W Channel Starvation

Symptom: AW transfers but W channel never provides data

Cause:

- Master FSM bug (forgot to drive W)
- W FIFO empty
- W channel disconnected

Check:

- After AW transfer, W must follow
- Monitor wvalid after awvalid

### Issue 2: W Beat Count Mismatch

Symptom: Wrong number of W beats for AW transaction

Cause:

- wlast logic error
- W counter bug
- Interleaving attempted

Check:

- Count W beats between AW transfer and wlast
- Must equal (awlen + 1)

### Issue 3: W Interleaving Attempted

Symptom: Protocol violation, wrong data written

Cause:

- Multiple masters driving W simultaneously (bridge bug)
- Master trying to interleave different transactions

Check:

- Only ONE W transaction active at a time
- Complete all W beats (until wlast) before next AW

#### **Issue 4: Invalid wstrb**

Symptom: Unexpected bytes written or not written

Cause:

- wstrb doesn't match awsize and awaddr
- wstrb has illegal pattern

Check:

- Calculate expected wstrb from awaddr/awsize
- Ensure contiguous '1' bits in wstrb (for aligned transfers)

#### **Issue 5: wvalid Dropped Before Handshake**

Symptom: Data beat lost

Cause:

- wvalid deasserted before wready seen

Check:

- wvalid must stay HIGH until (wvalid && wready)
- Master FSM must hold wvalid

### **3.2.10 Bridge-Specific W Channel Considerations**

#### **W Channel Arbitration**

Bridge W channel follows AW arbitration:

Step 1: Arbiter grants AW from master M (awid=X, awlen=N)

Step 2: Bridge connects M's W channel to slave

Step 3: Bridge waits for (N+1) W beats until wlast

Step 4: Bridge can grant next AW (possibly different master)

During Step 3:

- Other masters' W channels are blocked
- Other masters see wready=0 from bridge
- This master's W has exclusive access

#### **W Channel Routing**

Unlike AW/AR channels (routed by address), W channel:

- Has no address field
- Has no ID field
- Must follow corresponding AW

Bridge routing:

- After AW routes to slave S
- W channel automatically routes to same slave S
- Routing held until wlast seen
- Then released for next transaction

#### **W Channel Buffering**

Bridge may include W data buffering:

- Small FIFO to decouple master and slave timing
- Improves throughput
- Allows master to push data ahead

Depth typically:

- 2-4 entries for basic buffering
  - Deeper (8-16) for high-performance
- 

**Related Sections:** - Section 3.1: Write Address Channel (AW) - Section 3.3: Write Response Channel (B) - Section 2.6: Width Conversion - Section 2.4: Arbitration

**Next:** [3.3 Write Response Channel \(B\)](#)

## 3.3 Write Response Channel (B)

The Write Response (B) channel carries the acknowledgment from slave to master that a write transaction has completed. It includes the transaction ID and status.

### 3.3.1 Overview

**Key Characteristics:** - Travels from slave back to master (response path) - Contains transaction ID for matching to AW - Single beat per write transaction (regardless of awlen) - Can complete out-of-order for different IDs - Must complete in-order for same ID

### 3.3.2 Channel Signals

```
// Write Response Channel (Slave → Master)
 input logic bvalid; // Write response valid
 output logic bready; // Write response ready
 input logic [ID_WIDTH-1:0] bid; // Response ID (matches
awid)
 input logic [1:0] bresp; // Write response status
 input logic [USER_WIDTH-1:0] buser; // User-defined sideband
(optional)
```

### 3.3.3 Signal Descriptions

#### bvalid

- **Direction:** Slave → Master
- **Width:** 1 bit
- **Description:** Indicates write response is valid
- **Protocol Rules:**
  - Slave asserts when write transaction completes

- Must remain asserted until bready is HIGH on rising edge
- Once asserted, cannot be deasserted until handshake completes
- Must be LOW during reset
- One B response per AW transaction

### bready

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Indicates master is ready to accept response
- **Protocol Rules:**
  - Master asserts when ready for write response
  - Can be asserted before, during, or after bvalid
  - Recommended to assert early to avoid backpressure
  - Can toggle freely
  - Master should track outstanding writes and be ready

### bid

- **Direction:** Slave → Master
- **Width:** ID\_WIDTH bits (matches awid width)
- **Description:** Write response ID - matches original awid
- **Protocol Rules:**
  - Must match awid from corresponding AW transaction
  - Used by master to match response to original request
  - Bridge may use extended ID for response routing
  - Must remain stable while bvalid is asserted
  - Critical for out-of-order completion

### ID Matching:

Master sends:

AW: awid = 5  
W: (awlen+1) beats

Slave eventually responds:

B: bid = 5 ← Must match original awid

### bresp

- **Direction:** Slave → Master
- **Width:** 2 bits
- **Description:** Write response status

- **Protocol Rules:**
  - Indicates success or error of write transaction
  - All beats in the burst share same response
  - Must remain stable while bvalid is asserted

### **Response Encoding:**

bresp = 2'b00 (OKAY):     Write successful  
                                  All data written correctly  
                                  Normal completion

bresp = 2'b01 (EXOKAY): Exclusive access okay  
                                  Atomic operation succeeded  
                                  Rare in most systems

bresp = 2'b10 (SLVERR): Slave error  
                                  Slave detected error  
                                  Examples:  
                                  - Invalid register address  
                                  - Write to read-only register  
                                  - Permission denied  
                                  - Timeout on downstream bus

bresp = 2'b11 (DECERR): Decode error  
                                  No slave at this address  
                                  Generated by interconnect  
                                  Master requested unmapped address

### **Response Priority:**

If ANY beat encounters error, entire burst reports error:  
   - DECERR > SLVERR > EXOKAY > OKAY  
   - Worst error wins

Example:

Beat 0: OKAY  
 Beat 1: SLVERR ← Error on one beat  
 Beat 2: OKAY  
 Beat 3: OKAY

Final B response: bresp = SLVERR (error propagates to all)

### **buser**

- **Direction:** Slave → Master
- **Width:** USER\_WIDTH bits (optional, configurable)
- **Description:** User-defined sideband signaling
- **Protocol Rules:**

- User-defined extension to protocol
- Must remain stable while bvalid is asserted
- Not all designs include this signal

### 3.3.4 Handshake Timing

#### Zero-Wait Response:



#### Slave Wait States:



Slave waits 2 cycles for master to be ready.

#### Master Wait States:



Master ready early, waits 2 cycles for slave response.

### 3.3.5 Write Transaction Completion

#### Basic Write Flow

- Step 1: Master sends AW (awid=5, awlen=3)
- Step 2: Master sends 4 W beats (wlast on beat 3)
- Step 3: Slave processes write
- Step 4: Slave sends B (bid=5, bresp=OKAY)
- Step 5: Master receives B, transaction complete

#### Timeline:

```
Cycle 0: AW transfer (awid=5, awaddr=0x1000, awlen=3)
Cycle 1: W beat 0 transfer
Cycle 2: W beat 1 transfer
Cycle 3: W beat 2 transfer
Cycle 4: W beat 3 transfer (wlast=1)
         ^
         ^
         Slave can now generate B response
Cycle 5: ... (slave processing)
Cycle 6: B transfer (bid=5, bresp=OKAY)
```

#### When Does Slave Generate B?

#### Minimum Requirement:

Slave MUST see final W beat (wlast=1) before generating B

- B response covers ALL beats in the burst
- Cannot respond until all data received

#### Typical Timing:

Fast Slave (0 latency):

```
Cycle N: wlast=1 on W channel
Cycle N+1: bvalid=1 on B channel
```

Medium Slave (1-2 cycles):

```
Cycle N: wlast=1
Cycle N+1: (internal processing)
Cycle N+2: bvalid=1
```

Slow Slave (>2 cycles):

```
Cycle N: wlast=1
Cycles N+1 to N+K: (processing, may involve memory access)
Cycle N+K+1: bvalid=1
```

#### Outstanding Write Tracking

```
// Master tracks outstanding writes
logic [3:0] outstandingWrites; // Count of AW sent but B not
received
```

```

always_ff @(posedge aclk) begin
    if (!aresetn) begin
        outstanding_writes <= 0;
    end else begin
        case ({awvalid && awready, bvalid && bready})
            2'b10: outstanding_writes <= outstanding_writes + 1; // Aw only
            2'b01: outstanding_writes <= outstanding_writes - 1; // B only
            2'b11: outstanding_writes <= outstanding_writes; // Both
            2'b00: outstanding_writes <= outstanding_writes; // Neither
        endcase
    end
end

// Master should not overflow limit
assign aw_can_issue = (outstanding_writes < MAX_OUTSTANDING);

```

### 3.3.6 Ordering Rules

#### In-Order for Same ID

**Rule:** Responses with same ID must complete in order of their AW.

LEGAL:

```

Cycle 0: AW (awid=5)
Cycle 1: AW (awid=5)
Cycle 5: B (bid=5) ← First awid=5
Cycle 6: B (bid=5) ← Second awid=5 (must be after first)

```

ILLEGAL:

```

Cycle 0: AW (awid=5) transaction A
Cycle 1: AW (awid=5) transaction B
Cycle 5: B (bid=5) ← For transaction B
Cycle 6: B (bid=5) ← For transaction A
^^^

```

Order violation! Same ID responses out of order

#### Out-of-Order for Different IDs

**Rule:** Responses with different IDs CAN complete out of order.

LEGAL:

```

Cycle 0: AW (awid=5)
Cycle 1: AW (awid=7)
Cycle 5: B (bid=7) ← awid=7 completes first
Cycle 6: B (bid=5) ← awid=5 completes second

```

^^^

Legal! Different IDs can reorder

## Bridge Response Routing

In multi-master bridge, bid contains routing information:

Extended ID Structure:

```
bid = {bridge_id, original_master_id}
```

Bridge uses bridge\_id to route B back to correct master:

```
if (bid[upper_bits] == BRIDGE_ID_M0)
    route B to Master 0
else if (bid[upper_bits] == BRIDGE_ID_M1)
    route B to Master 1
...

```

### 3.3.7 Error Handling

#### OKAY - Normal Completion

```
bresp = 2'b00 (OKAY)
```

Meaning:

- All data written successfully
- No errors detected
- Normal case

Master Action:

- Transaction complete
- Continue normal operation

#### SLVERR - Slave Error

```
bresp = 2'b10 (SLVERR)
```

Possible Causes:

- Write to read-only register
- Address within slave range but invalid
- Protected/secure access violation
- Slave internal error
- Downstream error

Master Action:

- Log error
- May retry or report to software
- Consider transaction failed
- Do NOT assume data was written

## **DECERR - Decode Error**

bresp = 2'b11 (DECERR)

Causes:

- Address not mapped to any slave
- Bridge router detected out-of-range address
- Generated by interconnect, not by slave

Master Action:

- Log serious error
- Address may be corrupt or misconfigured
- Do NOT retry (address is invalid)
- Report to software for investigation

## **Bridge DECERR Generation:**

```
// Bridge generates DECERR for unmapped addresses
logic oor_error; // Out-of-range

// After AW with unmapped address
always_ff @(posedge aclk) begin
    if (aw_unmapped && awvalid && awready) begin
        // Sink W channel data
        // ...

        // Generate DECERR response
        bvalid <= 1'b1;
        bid <= captured_awid;
        bresp <= 2'b11; // DECERR
    end
end
```

## **EXOKAY - Exclusive Okay**

bresp = 2'b01 (EXOKAY)

Used For:

- Atomic/exclusive accesses
- Read-modify-write operations
- Semaphores, mutexes

Meaning:

- Exclusive write succeeded
- No other master modified location

Note:

- Rare in typical bridge configurations
- Requires exclusive access monitor
- Most bridges pass through or convert to OKAY

### 3.3.8 Common Issues and Debug

#### Issue 1: Missing B Response

Symptom: Master sends AW and W, but never receives B

Causes:

- Slave stuck or hung
- B channel disconnected
- Slave waiting for missing W data
- Bridge routing error

Debug:

- Check wlast was asserted
- Check slave received all W beats
- Monitor slave internal state
- Check bridge ID routing logic
- Implement timeout mechanism

#### Issue 2: ID Mismatch

Symptom: bid does not match any outstanding awid

Causes:

- Bridge routing error
- ID corruption
- Spurious B response

Debug:

- Log all awid values sent
- Compare bid to outstanding transactions
- Check bridge ID extension logic
- Verify slave ID pass-through

#### Issue 3: Wrong Number of B Responses

Symptom: More/fewer B responses than AW requests

Causes:

- Lost AW transaction
- Duplicate B response
- Bridge accounting error

Debug:

- Count AW transfers vs B transfers
- Should be 1:1 correspondence
- Track outstanding transactions

#### Issue 4: Out-of-Order Same ID

Symptom: Two awid=X transactions complete in wrong order

Causes:

- Slave reordering logic bug
- Bridge routing error
- Out-of-order memory

Debug:

- Log AW order and B order for same ID
- Check slave ordering guarantees
- Verify ID matching logic

### Issue 5: Unexpected Error Response

Symptom: bresp=SLVERR or DECERR on valid address

Causes:

- Protection violation
- Slave misconfiguration
- Transient error
- Bridge incorrectly flagging as out-of-range

Debug:

- Check address against memory map
- Verify awprot settings
- Check slave status registers
- Review bridge address decoder

## 3.3.9 Bridge-Specific Considerations

### Response Routing

Multi-master bridge must route B to correct master:

Method 1: ID-based routing (most common)

- Bridge extends ID: {bridge\_id, master\_id}
- B response contains full extended ID
- Bridge extracts bridge\_id to route B

Method 2: CAM/FIFO tracking

- Bridge records which master sent which awid
- Uses CAM to match bid to original master
- See Section 2.5 for details

### Response Forwarding

Bridge forwards bresp unchanged:

- OKAY → OKAY
- SLVERR → SLVERR
- DECERR → DECERR (from slave)

Bridge generates DECERR for unmapped addresses:

- Router detects address out of range
- Sinks W channel data
- Generates B with DECERR
- bid matches captured awid

## Response Buffering

Bridge may buffer B responses:

- Small FIFO (2-4 entries typical)
- Allows slave to push responses
- Reduces backpressure
- Master can be slow to accept

Buffering helps when:

- Multiple masters share response path
- Master occasionally busy ( $b_{ready}=0$ )
- Improves overall throughput

## Outstanding Transaction Limits

Bridge limits outstanding writes per master:

- Tracks count: AW sent, B not yet received
- Backpressures when limit reached
- Prevents CAM overflow
- Typical limits: 16-64 outstanding

If limit reached:

- $a_{ready} = 0$  to master
- Wait for B responses to free slots
- Then allow more AW

### 3.3.10 Performance Implications

#### Latency

Write Latency = Time from AW to B

Components:

1. AW to slave
2. W beats to slave
3. Slave processing
4. B back to master

Total: Typically 3-20 cycles for on-chip slave  
100+ cycles for off-chip memory

#### Throughput

Posted Writes:

- Master sends AW+W without waiting for B
- Can pipeline multiple writes
- B responses come later
- High throughput

Blocking Writes:

- Master waits for B before next AW

- Lower throughput
- Simpler master logic

## Pipelining

Maximum throughput when:

- Master issues AWs back-to-back
- bready always HIGH (ready for B)
- Slave bvalid frequently HIGH
- Deep outstanding queue

Limited by:

- Slave B response rate
- Master outstanding limit
- Bridge CAM depth

### 3.3.11 Verification Checks

```
// Assertion: Every Aw must get exactly one B
property aw_b_correspondence;
  @(posedge aclk) disable iff (!aresetn)
    awvalid && awready |-> ##[1:MAX_LATENCY] bvalid && bready;
endproperty

// Assertion: bid must match outstanding awid
property bid_matches_outstanding;
  @(posedge aclk) disable iff (!aresetn)
    bvalid && bready |-> outstanding_set[bid];
endproperty

// Assertion: Same ID responses in order
property same_id_order;
  @(posedge aclk) disable iff (!aresetn)
    (awvalid && awready && awid == ID_X) #1
    (awvalid && awready && awid == ID_X) |->
      first_matches(bvalid && bready && bid == ID_X) #1
      first_matches(bvalid && bready && bid == ID_X);
endproperty
```

---

**Related Sections:** - Section 3.1: Write Address Channel (AW) - Section 3.2: Write Data Channel (W) - Section 2.5: ID Management - Section 2.8: Response Routing - Section 2.9: Error Handling

**Next:** [3.4 Read Address Channel \(AR\)](#)

## 3.4 Read Address Channel (AR)

The Read Address channel carries all information required to describe a read transaction. It is similar to the Write Address channel but initiates read operations instead of writes.

### 3.4.1 Overview

**Key Characteristics:** - Travels from master to slave (request path) - Independent of write channels (AW/W/B) - Can be issued before, during, or after write operations - Supports bursts (1-256 beats) - Uses valid/ready handshake

### 3.4.2 Channel Signals

```
// Read Address Channel (Master → Slave)
output logic arvalid;          // Read address valid
input  logic arready;          // Read address ready
output logic [ID_WIDTH-1:0] arid; // Read address ID
output logic [ADDR_WIDTH-1:0] araddr; // Read address
output logic [7:0] arlen;       // Burst length (0-255)
output logic [2:0] arsize;      // Burst size (2^arsize
                             bytes per beat)
output logic [1:0] arbust;      // Burst type
output logic arlock;           // Lock type
output logic [3:0] arcache;     // Cache attributes
output logic [2:0] arprot;      // Protection attributes
output logic [3:0] arqos;        // Quality of Service
output logic [3:0] arregion;    // Region identifier
output logic [USER_WIDTH-1:0] aruser; // User-defined sideband
                                         (optional)
```

### 3.4.3 Signal Descriptions

#### arvalid

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Indicates read address channel signals are valid
- **Protocol Rules:**
  - Master asserts when address phase information is valid
  - Must remain asserted until arready is HIGH on rising edge
  - Once asserted, cannot be deasserted until handshake completes
  - Must be LOW during reset

#### arready

- **Direction:** Slave → Master

- **Width:** 1 bit
- **Description:** Indicates slave is ready to accept address
- **Protocol Rules:**
  - Slave asserts when ready to accept address information
  - Can be asserted before, during, or after arvalid is asserted
  - Recommended to assert before arvalid for zero-wait-state transfers
  - Can toggle freely (no requirement to wait for arvalid)

### **arid**

- **Direction:** Master → Slave
- **Width:** ID\_WIDTH bits (typically 1-16)
- **Description:** Read transaction identifier
- **Protocol Rules:**
  - Uniquely identifies read transactions from a master
  - Transactions with different IDs can complete out-of-order
  - Transactions with same ID must complete in order
  - Bridge may append additional bits for multi-master routing
  - Must remain stable while arvalid is asserted
  - R response will carry matching rid

### **Out-of-Order Reads:**

Master issues:

```
AR: arid=3, araddr=0x1000
AR: arid=5, araddr=0x2000
```

Slave can respond in ANY order:

```
R: rid=5 (all beats) ← Can complete first
R: rid=3 (all beats) ← Even though issued first
```

This allows fast/cached reads to overtake slow reads.

### **araddr**

- **Direction:** Master → Slave
- **Width:** ADDR\_WIDTH bits (typically 32 or 64)
- **Description:** Starting address of read burst
- **Protocol Rules:**
  - Byte address of the first beat in the burst
  - Must be aligned to arsize (araddr[arsize-1:0] == 0)
  - Subsequent beat addresses calculated using arburst type

- Must remain stable while arvalid is asserted
- Bridge router uses bits to select target slave

## **Addressing Examples:**

Single word read (32-bit):

araddr = 0x0000\_1000, arsize = 2 (4 bytes), arlen = 0 (1 beat)  
Reads bytes at 0x1000-0x1003

Cache line read (64-byte, 64-bit bus):

araddr = 0x0000\_2000, arsize = 3 (8 bytes), arlen = 7 (8 beats)  
Beat 0: 0x2000-0x2007  
Beat 1: 0x2008-0x200F  
...  
Beat 7: 0x2038-0x203F

### **arlen**

- **Direction:** Master → Slave
- **Width:** 8 bits
- **Description:** Number of data beats in burst (actual beats = arlen + 1)
- **Protocol Rules:**
  - Valid range: 0 to 255 (1 to 256 beats)
  - arlen = 0 → 1 beat (single transfer)
  - arlen = 15 → 16 beats (common for cache line fill)
  - arlen = 255 → 256 beats (maximum)
  - Must remain stable while arvalid is asserted

## **Read Burst Lengths:**

|             |             |                                  |
|-------------|-------------|----------------------------------|
| arlen = 0   | → 1 beat    | (single read)                    |
| arlen = 3   | → 4 beats   | (small burst)                    |
| arlen = 7   | → 8 beats   | (common)                         |
| arlen = 15  | → 16 beats  | (cache line - 64B on 32-bit bus) |
| arlen = 255 | → 256 beats | (max, rarely used)               |

### **arsize**

- **Direction:** Master → Slave
- **Width:** 3 bits
- **Description:** Size of each beat in bytes ( $2^{\text{arsize}}$ )
- **Protocol Rules:**
  - Valid range: 0 to  $\log_2(\text{DATA\_WIDTH}/8)$
  - Cannot exceed data bus width
  - All beats in burst use same size

- Must remain stable while arvalid is asserted

### Size Encoding:

|                 |             |           |                   |
|-----------------|-------------|-----------|-------------------|
| arsize = 3'b000 | → 1 byte    | ( $2^0$ ) | - Byte read       |
| arsize = 3'b001 | → 2 bytes   | ( $2^1$ ) | - Halfword read   |
| arsize = 3'b010 | → 4 bytes   | ( $2^2$ ) | - Word read       |
| arsize = 3'b011 | → 8 bytes   | ( $2^3$ ) | - Doubleword read |
| arsize = 3'b100 | → 16 bytes  | ( $2^4$ ) |                   |
| arsize = 3'b101 | → 32 bytes  | ( $2^5$ ) |                   |
| arsize = 3'b110 | → 64 bytes  | ( $2^6$ ) |                   |
| arsize = 3'b111 | → 128 bytes | ( $2^7$ ) |                   |

### Bus Width Constraints:

32-bit bus (4 bytes): arsize ≤ 3'b010 (max 4 bytes)  
 64-bit bus (8 bytes): arsize ≤ 3'b011 (max 8 bytes)  
 128-bit bus (16 bytes): arsize ≤ 3'b100 (max 16 bytes)  
 256-bit bus (32 bytes): arsize ≤ 3'b101 (max 32 bytes)

Narrow reads (arsize < bus width) are legal and common.

### **arburst**

- **Direction:** Master → Slave
- **Width:** 2 bits
- **Description:** Burst type - how addresses increment
- **Protocol Rules:**
  - Controls address calculation for multi-beat bursts
  - Must remain stable while arvalid is asserted

### Burst Type Encoding:

|                                     |                                                                                                                                          |
|-------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| arburst = 2'b00 (FIXED):            | Address stays constant for all beats<br>Used for FIFO reads<br>All beats read from same address                                          |
| arburst = 2'b01 (INCR):             | Address increments by arsize each beat<br>Most common burst type for memory<br>No boundary restrictions                                  |
| arburst = 2'b10 (WRAP):<br>boundary | Address increments then wraps at<br>Boundary = arsize × (arlen+1)<br>Used for cache line wrapping<br>Burst length must be 2, 4, 8, or 16 |
| arburst = 2'b11 (RESERVED):         | Not used in AXI4                                                                                                                         |

## **Read Burst Address Examples:**

### *INCR Burst:*

```
araddr = 0x1000, arsize = 2 (4 bytes), arlen = 3, arburst = INCR
Beat 0: Read 0x1000-0x1003
Beat 1: Read 0x1004-0x1007
Beat 2: Read 0x1008-0x100B
Beat 3: Read 0x100C-0x100F
```

### *WRAP Burst:*

```
araddr = 0x100C, arsize = 2 (4 bytes), arlen = 3, arburst = WRAP
Wrap boundary = 4 × 4 = 16 bytes (0x1000-0x100F)
Beat 0: Read 0x100C-0x100F
Beat 1: Read 0x1000-0x1003 (wrapped)
Beat 2: Read 0x1004-0x1007
Beat 3: Read 0x1008-0x100B
```

Useful for circular buffers and cache line wrapping.

### *FIXED Burst (FIFO):*

```
araddr = 0x2000, arsize = 2 (4 bytes), arlen = 3, arburst = FIXED
Beat 0: Read 0x2000-0x2003
Beat 1: Read 0x2000-0x2003 (same address)
Beat 2: Read 0x2000-0x2003 (same address)
Beat 3: Read 0x2000-0x2003 (same address)
```

Used for reading from FIFO or stream interface.

## **arlock, arcache, arprot, arqos, arregion**

These signals have identical encoding and behavior to their write channel equivalents (awlock, awcache, etc.). See Section 3.1 for detailed descriptions.

**Quick Reference:** - **arlock:** Atomic/exclusive access (1 bit) - **arcache:** Cache attributes (4 bits) - **arprot:** Protection/privilege attributes (3 bits) - **arqos:** Quality of Service priority (4 bits) - **arregion:** Region identifier (4 bits)

## **aruser**

- **Direction:** Master → Slave
- **Width:** USER\_WIDTH bits (optional, configurable)
- **Description:** User-defined sideband signaling
- **Protocol Rules:**
  - User-defined extension to protocol
  - Must remain stable while arvalid is asserted

- Not all designs include this signal

### 3.4.4 Handshake Timing

#### Single-Cycle Transfer:



#### Master Wait States:



Master waits 2 cycles for slave to become ready.

#### Slave Wait States:



Slave ready early, waits for master to assert arvalid.

### 3.4.5 Read vs Write Channel Differences

#### Key Differences from AW Channel

|            | AR Channel                        | AW Channel                             |
|------------|-----------------------------------|----------------------------------------|
| Data Beat: | R (from slave)                    | W (from master)                        |
| Response:  | R channel carries data AND status | Separate B channel carries only status |

|               |                                   |                                |
|---------------|-----------------------------------|--------------------------------|
| Completion:   | All (arlen+1) R beats             | All W beats + 1 B response     |
| Interleaving: | R channel CAN<br>interleave by ID | W channel CANNOT<br>interleave |
| Data Flow:    | Slave → Master                    | Master → Slave                 |

### Independence from Write Channels

AR and AW channels are COMPLETELY independent:

- Can issue AR and AW in any order
- Can issue AR while writes are outstanding
- Can issue AW while reads are outstanding
- No required ordering between read and write

Example valid sequence:

```
Cycle 0: AW (write to 0x1000)
Cycle 1: AR (read from 0x2000)
Cycle 2: W (write data)
Cycle 3: AR (another read)
Cycle 4: R (read data response)
Cycle 5: B (write response)
```

All legal! Reads and writes are independent.

### 3.4.6 4KB Boundary Rule

**Critical AXI4 Rule:** A read burst must NOT cross a 4KB address boundary.

**Rationale:** - Same as writes: prevents crossing page boundaries - Critical for MMU/virtual memory systems - 4KB = standard page size

**Checking for Violation:**

```
// Calculate whether burst crosses 4KB boundary
logic [ADDR_WIDTH-1:0] start_addr;
logic [ADDR_WIDTH-1:0] end_addr;
logic [ADDR_WIDTH-1:0] burst_bytes;
logic boundary_crossed;

assign start_addr = araddr;
assign burst_bytes = (arlen + 1) << arsize;
assign end_addr = start_addr + burst_bytes - 1;

// Check if upper bits change (crossed 4KB boundary)
assign boundary_crossed = (start_addr[ADDR_WIDTH-1:12] != end_addr[ADDR_WIDTH-1:12]);

// Assert error if boundary crossed
assert property (@(posedge aclk)
```

```
    arvalid && arready | -> !boundary_crossed  
) else $error("AR burst crosses 4KB boundary!");
```

### Valid vs Invalid Examples:

VALID:

```
araddr = 0x1FF0, arlen = 3 (4 beats), arsize = 2 (4 bytes)  
End = 0x1FF0 + 16 - 1 = 0xFFFF  
Page bits: 0x1 (start) == 0x1 (end) ✓
```

INVALID:

```
araddr = 0x1FF0, arlen = 7 (8 beats), arsize = 2 (4 bytes)  
End = 0x1FF0 + 32 - 1 = 0x200F  
Page bits: 0x1 (start) != 0x2 (end) ✗
```

### 3.4.7 Out-of-Order Read Completion

#### Why Out-of-Order?

Different arid values can complete out of order:

- Slave can prioritize cached reads over uncached
- Fast memories respond before slow memories
- Improves average latency
- Increases throughput

Example:

```
Cycle 0: AR (arid=1, addr in slow memory)  
Cycle 1: AR (arid=2, addr in fast cache)  
Cycle 5: R (rid=2) - cache hit, responds first  
Cycle 20: R (rid=1) - slow memory, responds later
```

#### Same ID Ordering Requirement

Same arid MUST complete in order:

LEGAL:

```
AR: arid=5, addr=0x1000  
AR: arid=5, addr=0x2000  
R: rid=5 (for 0x1000) ← First  
R: rid=5 (for 0x2000) ← Second, in order
```

ILLEGAL:

```
AR: arid=5, addr=0x1000 (transaction A)  
AR: arid=5, addr=0x2000 (transaction B)  
R: rid=5 (for 0x2000) ← Transaction B first  
R: rid=5 (for 0x1000) ← Transaction A second  
^^^
```

VIOLETION! Same ID reordered

## Master ID Management Strategy

Strategy: Use different arid for independent reads

- Read A doesn't depend on Read B → use different IDs
- Read B depends on Read A result → use same ID

Example - Independent:

Read configuration register: arid=0  
Read data buffer: arid=1  
→ Can complete out of order

Example - Dependent:

Read pointer: arid=5  
Read data at pointer: arid=5  
→ Must complete in order

## 3.4.8 Bridge-Specific AR Channel Behavior

### Address Decoding

Bridge router examines araddr to select slave:

```
if (araddr >= SLAVE0_BASE && araddr < SLAVE0_END)
    route to Slave 0
else if (araddr >= SLAVE1_BASE && araddr < SLAVE1_END)
    route to Slave 1
...
else
    generate DECERR response (out of range)
```

### ID Extension

Bridge extends arid for response routing:

Master sends:

arid = [ID\_WIDTH-1:0] = 4'b0101

Bridge extends to:

arid\_extended = {bridge\_id, master\_id}  
arid\_extended = {2'b00, 4'b0101} = 6'b00\_0101  
              ^  ^  
              Bridge Master

This allows R responses to route back to correct master.

### AR Channel Arbitration

Multiple masters request same slave simultaneously:

Step 1: Bridge arbiter selects one master

- Round-robin, fixed priority, or QoS-based
- Uses arqos for priority hints

- Step 2: Winning master's AR connects to slave
- Step 3: Losing masters see arready=0 (backpressure)
- Step 4: After AR transfer completes, arbitrate again

### Outstanding Read Tracking

Bridge tracks outstanding reads per master:

- Increment when AR transfers
- Decrement when R completes (rlast seen)
- Limit to prevent resource exhaustion

If outstanding limit reached:

- already = 0 to that master
- Wait for R responses to complete
- Then allow more AR

## 3.4.9 Common Issues and Debug

### Issue 1: arvalid Dropped Before Handshake

Symptom: AR transaction lost

Cause: Master deasserts arvalid before arready seen

Check:

- arvalid must stay HIGH until (arvalid && arready)
- Master FSM must hold arvalid
- Protocol violation if dropped early

### Issue 2: Unaligned Address

Symptom: Unexpected rresp=SLVERR or DECERR

Cause: araddr not aligned to arsize

Check:

- araddr[arsize-1:0] must == 0
- Example: arsize=2 (4 bytes) → araddr[1:0] must be 00
- Slave may return error for unaligned reads

### Issue 3: arsize Exceeds Bus Width

Symptom: Protocol violation, undefined behavior

Cause: arsize > log2(DATA\_WIDTH/8)

Check:

- 32-bit bus: arsize ≤ 2
- 64-bit bus: arsize ≤ 3
- 128-bit bus: arsize ≤ 4
- Master configuration error

### Issue 4: 4KB Boundary Violation

Symptom: Slave returns error or hangs

Cause: Burst crosses page boundary

Check:

- Calculate: end\_addr = araddr + (arlen+1)×(2^arsize) - 1

- Check:  $\text{araddr}[\text{ADDR\_WIDTH}-1:12] == \text{end\_addr}[\text{ADDR\_WIDTH}-1:12]$
- If different, burst crosses 4KB boundary

### Issue 5: No R Response

Symptom: AR transfers but R never comes

Cause:

- Slave hung or stuck
- R channel disconnected
- Bridge routing error
- Out-of-range address (should get DECERR)

Debug:

- Check slave status
- Check bridge ID routing
- Verify address is in valid range
- Implement timeout mechanism

## 3.4.10 Performance Considerations

### Latency

Read Latency = Time from AR to first R beat

Components:

1. AR to slave (bridge routing delay)
2. Slave memory access
3. R back to master (bridge routing delay)

Typical values:

- On-chip SRAM: 2-5 cycles
- On-chip register: 1-2 cycles
- Off-chip DDR: 20-100+ cycles
- Cache hit: 1-3 cycles
- Cache miss: 20-50+ cycles

### Throughput

Maximum read throughput:

- Issue AR every cycle
- Slave returns R data every cycle
- Use different arid for each to allow out-of-order
- Limited by:
  - \* Outstanding transaction limit
  - \* Slave bandwidth
  - \* Bridge arbitration

### Speculative Reads

Masters can speculatively issue AR:

- Issue AR before knowing if data needed
- Cancel by not accepting R data ( $rready=0$  forever)
- Or discard R data when received

#### Prefetching:

- Issue AR for next cache line before current done
  - Reduces average latency
  - Requires sufficient outstanding capacity
- 

**Related Sections:** - Section 3.1: Write Address Channel (AW) - Section 3.5: Read Data Channel (R) - Section 2.2: Slave Router (address decoding) - Section 2.4: Arbitration - Section 2.5: ID Management

**Next:** [3.5 Read Data and Response Channel \(R\)](#)

## 3.5 Read Data and Response Channel (R)

The Read Data channel combines both read data and response status, returning data from slave to master in response to a read address request. Unlike writes (which use separate W and B channels), reads use a single R channel for data and status.

### 3.5.1 Overview

**Key Characteristics:** - Travels from slave back to master (response path) - Contains both data AND response status (unlike B channel) - One R beat per data beat (arlen + 1 total beats) - CAN interleave by ID (unlike W channel) - Contains transaction ID for matching to AR - Uses valid/ready handshake per beat

### 3.5.2 Channel Signals

```
// Read Data Channel (Slave → Master)
 rvalid;          // Read data valid
 rready;          // Read data ready
 [ID_WIDTH-1:0] rid;    // Read ID (matches arid)
 [DATA_WIDTH-1:0] rdata;   // Read data
 [1:0] rresp;          // Read response status
 rlast;           // Last beat of burst
 [USER_WIDTH-1:0] ruser;   // User-defined sideband
(optional)
```

### 3.5.3 Signal Descriptions

#### rvalid

- **Direction:** Slave → Master
- **Width:** 1 bit

- **Description:** Indicates read data and response are valid
- **Protocol Rules:**
  - Slave asserts when R channel information is valid
  - Must remain asserted until rready is HIGH on rising edge
  - Once asserted, cannot be deasserted until handshake completes
  - Must be LOW during reset
  - Multiple R beats can be pipelined (different rids)

### rready

- **Direction:** Master → Slave
- **Width:** 1 bit
- **Description:** Indicates master is ready to accept read data
- **Protocol Rules:**
  - Master asserts when ready for data
  - Can toggle freely (no dependency on rvalid)
  - Recommended to assert early to maximize throughput
  - Should be HIGH when master has buffer space

### rid

- **Direction:** Slave → Master
- **Width:** ID\_WIDTH bits (matches arid width)
- **Description:** Read response ID - matches original arid
- **Protocol Rules:**
  - Must match arid from corresponding AR transaction
  - Used by master to match data to original request
  - All beats in a burst carry same rid
  - Bridge may use extended ID for response routing
  - Must remain stable while rvalid is asserted

### ID Matching:

Master sends:

AR: arid = 7, arlen = 3 (4 beats)

Slave responds with 4 R beats:

```
R beat 0: rid = 7, rlast = 0
R beat 1: rid = 7, rlast = 0
R beat 2: rid = 7, rlast = 0
R beat 3: rid = 7, rlast = 1
```

All beats have same rid matching arid.

### rdata

- **Direction:** Slave → Master
- **Width:** DATA\_WIDTH bits (typically 32, 64, 128, 256, 512, or 1024)
- **Description:** Read data for current beat
- **Protocol Rules:**
  - Contains data read from memory/registers
  - Valid when rresp == OKAY or EXOKAY
  - May be undefined/invalid when rresp == SLVERR or DECERR
  - Must remain stable while rvalid is asserted
  - Byte order: Little-endian within bus width

### Data Bus Layout (64-bit example):

```
rdata[63:56] - Byte 7  (highest address)
rdata[55:48] - Byte 6
rdata[47:40] - Byte 5
rdata[39:32] - Byte 4
rdata[31:24] - Byte 3
rdata[23:16] - Byte 2
rdata[15: 8] - Byte 1
rdata[ 7: 0] - Byte 0  (lowest address)
```

### Narrow Reads:

For arsize < bus width, only relevant bytes contain valid data:

64-bit bus, reading 4 bytes (arsize=2) from address 0x1000:  
rdata[31:0] = valid data from 0x1000-0x1003  
rdata[63:32] = undefined (don't care)

64-bit bus, reading 1 byte (arsize=0) from address 0x1002:  
araddr[2:0] = 0x2 → byte lane 2  
rdata[23:16] = valid data from 0x1002  
rdata[others] = undefined (don't care)

### rresp

- **Direction:** Slave → Master
- **Width:** 2 bits
- **Description:** Read response status for this beat
- **Protocol Rules:**
  - Indicates success or error of read transaction

- **ALL beats in burst must have same rresp** (AXI4 rule)
- If any error occurs, all beats report error
- Must remain stable while rvalid is asserted

### **Response Encoding:**

rresp = 2'b00 (OKAY):     Read successful  
                          rdata contains valid data  
                          Normal completion

rresp = 2'b01 (EXOKAY): Exclusive access okay  
                          Atomic read succeeded  
                          rdata contains valid data  
                          Rare in most systems

rresp = 2'b10 (SLVERR): Slave error  
                          Slave detected error  
                          rdata may be invalid  
                          Examples:  
                          - Read from write-only register  
                          - Invalid register address  
                          - Permission denied  
                          - ECC uncorrectable error

rresp = 2'b11 (DECERR): Decode error  
                          No slave at this address  
                          Generated by interconnect  
                          rdata is invalid/undefined  
                          Master requested unmapped address

### **Error Propagation Rule:**

If ANY beat encounters error, ALL beats report error:

- DECERR > SLVERR > EXOKAY > OKAY
- Worst error wins

Example burst with 4 beats:

Beat 0: Address 0x1000, can detect it's out-of-range  
 Beat 1-3: Never accessed (error detected on beat 0)

All 4 R beats return: rresp = DECERR

AXI4 Rule: "A slave must use the same RRESP value for all data transfers in a burst"

### **rlast**

- **Direction:** Slave → Master
- **Width:** 1 bit

- **Description:** Indicates last data beat in read burst
- **Protocol Rules:**
  - Asserted on final R beat of transaction
  - Beat count must match arlen + 1 from corresponding AR
  - Only ONE R beat in a burst has rlast=1
  - Must remain stable while rvalid is asserted
  - Critical for burst boundary detection and ID tracking

### **rlast Timing:**

Transaction with arlen=3 (4 beats total):

```
R beat 0: rvalid=1, rlast=0, rid=5 → First beat
R beat 1: rvalid=1, rlast=0, rid=5 → Middle beat
R beat 2: rvalid=1, rlast=0, rid=5 → Middle beat
R beat 3: rvalid=1, rlast=1, rid=5 → LAST beat
```

Single beat (arlen=0):

```
R beat 0: rvalid=1, rlast=1, rid=5 → First AND last
```

### **ruser**

- **Direction:** Slave → Master
- **Width:** USER\_WIDTH bits (optional, configurable)
- **Description:** User-defined sideband signaling
- **Protocol Rules:**
  - User-defined extension to protocol
  - Must remain stable while rvalid is asserted
  - Not all designs include this signal

## 3.5.4 Handshake Timing

### Single Beat Transfer:



rlast                        

### Burst Transfer (4 Beats):



### Slave Wait States:



Beat 0 transferred cycle 1

Beat 1 stalled cycles 2-3 (rvalid=0, slave not ready)

Beat 1 transferred cycle 4

Beat 2 stalled cycle 5

Beat 2 transferred cycle 6

### Master Backpressure:



Beat 0 transferred cycle 1

```
Beat 1 stalled cycles 2-4 (rready=0, master not ready)
Beat 1 transferred cycle 5
```

### 3.5.5 R Channel Interleaving

**KEY DIFFERENCE:** Unlike W channel, R channel **CAN** interleave by ID.

#### Why Interleaving?

Different rid values can be interleaved:

- Allows out-of-order completion
- Slave doesn't need to buffer full bursts
- Can return data as soon as available
- Improves throughput and latency

This is the PRIMARY mechanism for out-of-order reads in AXI4.

#### Interleaving Rules

Rule 1: R beats with SAME rid must be consecutive (no interleaving)

Rule 2: R beats with DIFFERENT rid CAN be interleaved

Rule 3: Within same rid, beats must be in order (beat 0, 1, 2...)

LEGAL:

```
AR: arid=5, arlen=3 (4 beats needed)
AR: arid=7, arlen=1 (2 beats needed)
```

```
R: rid=5, beat 0, rlast=0
R: rid=5, beat 1, rlast=0
R: rid=7, beat 0, rlast=0 ← Interleaved! Different rid
R: rid=7, beat 1, rlast=1 ← Complete rid=7
R: rid=5, beat 2, rlast=0 ← Back to rid=5
R: rid=5, beat 3, rlast=1 ← Complete rid=5
```

ILLEGAL:

```
R: rid=5, beat 0, rlast=0
R: rid=5, beat 1, rlast=0
R: rid=5, beat 3, rlast=0 ← WRONG! Skipped beat 2
R: rid=5, beat 2, rlast=1
```

#### Interleaving Examples

##### Example 1: Fast Read Overtakes Slow Read

Cycle 0: AR (arid=1, slow memory address)  
Cycle 1: AR (arid=2, cached address)

Slave responses:

```
Cycle 5: R (rid=2, beat 0, rlast=1) ← Cache hit, completes fast
Cycle 20: R (rid=1, beat 0)           ← Slow memory
Cycle 21: R (rid=1, beat 1, rlast=1)
```

rid=2 completed before rid=1 even though issued later!

### Example 2: Interleaved Multi-Beat Bursts

Cycle 0: AR (arid=3, arlen=3, 4 beats)  
Cycle 1: AR (arid=4, arlen=1, 2 beats)

Slave responses (interleaved):

Cycle 10: R (rid=3, beat 0, rlast=0)  
Cycle 11: R (rid=3, beat 1, rlast=0)  
Cycle 12: R (rid=4, beat 0, rlast=0) ← Interleave starts  
Cycle 13: R (rid=4, beat 1, rlast=1) ← Complete rid=4  
Cycle 14: R (rid=3, beat 2, rlast=0) ← Back to rid=3  
Cycle 15: R (rid=3, beat 3, rlast=1) ← Complete rid=3

## 3.5.6 Ordering Rules

### In-Order for Same ID

**Rule:** R responses with same rid must complete in order of their AR.

LEGAL:

Cycle 0: AR (arid=5, addr=0x1000)  
Cycle 1: AR (arid=5, addr=0x2000)  
Cycle 5: R (rid=5... rlast=1) ← For first AR (0x1000)  
Cycle 6: R (rid=5... rlast=1) ← For second AR (0x2000)

ILLEGAL:

Cycle 0: AR (arid=5, addr=0x1000) transaction A  
Cycle 1: AR (arid=5, addr=0x2000) transaction B  
Cycle 5: R (rid=5... rlast=1) ← For transaction B  
Cycle 6: R (rid=5... rlast=1) ← For transaction A  
^^^

VIOLATION! Same ID out of order

### Out-of-Order for Different IDs

**Rule:** R responses with different rid CAN complete out of order.

LEGAL:

Cycle 0: AR (arid=5)  
Cycle 1: AR (arid=7)  
Cycle 5: R (rid=7... rlast=1) ← Second AR completes first  
Cycle 6: R (rid=5... rlast=1) ← First AR completes second

This is the whole point of having IDs!

### 3.5.7 Data Width Conversion

When masters and slaves have different data widths, the bridge inserts width converters.

#### Upsizing (Narrow Slave → Wide Master)

##### Example: 32-bit slave, 64-bit master

Slave returns (2 beats):

```
R beat 0: rdata[31:0] = 0xAAAA_AAAA  
R beat 1: rdata[31:0] = 0xBBBB_BBBB, rlast=1
```

Converter packs → Master sees (1 beat):

```
R beat 0: rdata[63:0] = 0xBBBB_BBBB_AAAA_AAAA, rlast=1
```

#### Downsizing (Wide Slave → Narrow Master)

##### Example: 64-bit slave, 32-bit master

Slave returns (1 beat):

```
R beat 0: rdata[63:0] = 0xBBBB_BBBB_AAAA_AAAA, rlast=1
```

Converter splits → Master sees (2 beats):

```
R beat 0: rdata[31:0] = 0xAAAA_AAAA, rlast=0  
R beat 1: rdata[31:0] = 0xBBBB_BBBB, rlast=1
```

See Section 2.6 for full width conversion details.

### 3.5.8 Error Handling

#### OKAY - Normal Read

rresp = 2'b00 (OKAY)

Meaning:

- Read successful
- rdata contains valid data from slave
- Normal case

Master Action:

- Accept rdata
- Use data for computation
- Transaction complete when rlast=1

## **SLVERR - Slave Error**

rresp = 2'b10 (SLVERR)

Possible Causes:

- Read from write-only register
- Address within slave range but invalid
- Protected/secure access violation
- ECC uncorrectable error
- Slave internal error

Master Action:

- rdata may be invalid/undefined
- Do NOT use rdata for critical operations
- Log error
- May retry or report to software

Slave Behavior:

- ALL beats in burst must return SLVERR
- rdata can be any value (master should ignore)
- rlast still indicates burst boundary

## **DECERR - Decode Error**

rresp = 2'b11 (DECERR)

Causes:

- Address not mapped to any slave
- Bridge router detected out-of-range address
- Generated by interconnect, not by slave

Master Action:

- rdata is invalid/undefined
- Do NOT use rdata
- Log serious error
- Do NOT retry (address is wrong)
- Report to software

Bridge Generation:

- Bridge generates DECERR for unmapped addresses
- Returns (arlen+1) R beats with rresp=DECERR
- rdata = pattern (e.g., 0xDEADCAFE) or undefined
- rid = captured arid
- rlast on final beat

**Bridge DECERR Generation:**

```
// Bridge generates DECERR for out-of-range addresses
always_ff @(posedge aclk) begin
    if (ar_out_of_range && arvalid && arready) begin
```

```

    // Capture transaction details
    oor_arid <= arid;
    oor_arlen <= arlen;
    oor_count <= 0;
    oor_generating <= 1'b1;
end else if (oor_generating) begin
    // Generate R beats with DECERR
    rvalid <= 1'b1;
    rid <= oor_arid;
    rdata <= ERROR_PATTERN; // e.g., 0xDEADBEEF
    rresp <= 2'b11; // DECERR
    rlast <= (oor_count == oor_arlen);

    if (rready) begin
        oor_count <= oor_count + 1;
        if (rlast) oor_generating <= 1'b0;
    end
end
end

```

### EXOKAY - Exclusive Okay

rresp = 2'b01 (EXOKAY)

Used For:

- Atomic/exclusive reads
- Part of load-linked/store-conditional
- Semaphore operations

Meaning:

- Exclusive read succeeded
- Exclusive monitor tagged this address
- Subsequent exclusive write can succeed

Note:

- Rare in typical systems
- Requires exclusive access monitor
- Most bridges pass through

## 3.5.9 Bridge-Specific R Channel Considerations

### Response Routing

Multi-master bridge must route R to correct master:

Method 1: ID-based routing (most common)  
rid = {bridge\_id, original\_master\_id}  
Bridge extracts bridge\_id bits to route R

Method 2: CAM tracking

Bridge records which master sent which arid

Uses CAM to match rid to original master  
See Section 2.5 for details

Both methods work with R interleaving.

### R Channel Merging

Bridge merges R from multiple slaves → master:

Challenge: Multiple slaves may have R data ready simultaneously

Solution: R channel arbiter

- Round-robin or priority-based
- Considers rid to allow interleaving
- Must not split bursts (wait for rlast)

Example:

Slave 0: Has R data ready (rid=3)

Slave 1: Has R data ready (rid=5)

Arbiter grants Slave 0

Transfers all beats for rid=3 until rlast

Then can grant Slave 1 for rid=5

### Response Buffering

Bridge may buffer R data:

- Small FIFO (2-8 entries typical)
- Allows slave to push data ahead
- Reduces backpressure
- Master can be slow ( $rready=0$ )

Depth considerations:

- Minimum: 2 (ping-pong)
- Typical: 4-8 (good balance)
- High performance: 16-32 (deep buffering)

### Outstanding Transaction Limits

Bridge must track outstanding reads:

- AR issued but R not complete ( $rlast$  not seen)
- Each slave has limit (e.g., 16 outstanding)
- Prevents resource exhaustion

If limit reached:

- $rready = 0$  (backpressure AR channel)
- Wait for R completions ( $rlast$ )
- Then allow more AR

### 3.5.10 Performance Implications

#### Latency

Read latency = AR to last R beat ( $rlast=1$ )

Minimum (best case):

- Register read: 1-2 cycles
- SRAM read: 2-3 cycles

Typical:

- On-chip memory: 3-10 cycles
- Cache hit: 2-5 cycles

Maximum (worst case):

- Cache miss → off-chip: 50-200 cycles
- Slow peripheral: 100+ cycles

#### Throughput

Maximum throughput:

- R data every cycle
- Requires:
  - \* Slave rvalid always HIGH
  - \* Master rready always HIGH
  - \* Deep pipelining
  - \* Multiple outstanding reads

Bottlenecks:

- Slave memory bandwidth
- Bridge arbitration
- Master buffer space ( $rready=0$ )
- Outstanding limit reached

#### Optimizations

##### 1. Pipelining:

Issue multiple ARs before first R returns:

- Cycle 0: AR (arid=0)
- Cycle 1: AR (arid=1)
- Cycle 2: AR (arid=2)
- Cycle 5: R (rid=0) ... ← First data arrives
- Cycle 6: R (rid=1) ...
- Cycle 7: R (rid=2) ...

Hides AR-to-R latency.

##### 2. Interleaving:

Use different arids for independent reads:

- Allows out-of-order completion

- Slave can return cached data immediately
- Doesn't block on slow reads

### 3. Burst Reads:

Use longer bursts where possible:

- arlen=15 (16 beats) better than 16× arlen=0
- Amortizes AR overhead
  - Better memory efficiency
  - Higher throughput

## 3.5.11 Common Issues and Debug

### Issue 1: Missing R Data

Symptom: AR transfers but R never arrives

Causes:

- Slave hung
- R channel disconnected
- Bridge routing error
- Out-of-range address (should get DECERR)

Debug:

- Check slave status
- Verify bridge routing logic
- Check address map
- Implement timeout

### Issue 2: rid Mismatch

Symptom: rid doesn't match any outstanding arid

Causes:

- Bridge routing error
- ID corruption
- Spurious R response

Debug:

- Track outstanding arids
- Compare rid to pending list
- Check bridge ID extension

### Issue 3: Wrong Beat Count

Symptom: Number of R beats doesn't match arlen+1

Causes:

- rlast logic error
- Beat counter bug
- Lost R beats

Debug:

- Count R beats from first to rlast

- Must equal (arlen+1) from corresponding AR
- Check rlast generation logic

#### Issue 4: Illegal R Interleaving

Symptom: R beats with same rid not consecutive

Causes:

- Slave implementation error
- Bridge merging bug

Debug:

- Track rid sequence
- Verify same rid beats are consecutive
- Check for interleaving violations

#### Issue 5: rdata Invalid on Error

Symptom: Master uses rdata despite rresp=SLVERR/DECERR

Causes:

- Master not checking rresp
- Software bug

Debug:

- Always check rresp before using rdata
- Add assertion: if (rresp != OKAY) don't use rdata

### 3.5.12 Verification Checks

```
// Assertion: Every AR gets (arlen+1) R beats
property ar_r_correspondence;
    logic [7:0] expected_beats;
    @(posedge aclk) disable iff (!aresetn)
        (arvalid && arready, expected_beats = arlen + 1) |>
        ##[1:MAX_LATENCY]
        (rvalid && rready && rid == $past(arid), expected_beats--)
[*expected_beats];
endproperty

// Assertion: rlast only on final beat
property rlast_on_final;
    @(posedge aclk) disable iff (!aresetn)
        (rvalid && rready && rlast) |> (beat_count == expected_beats);
endproperty

// Assertion: Same rid beats consecutive
property same_rid_consecutive;
    @(posedge aclk) disable iff (!aresetn)
        (rvalid && rready && !rlast && rid == ID_X) |>
        ##1 (rvalid && rready) |> (rid == ID_X);
endproperty
```

```

// Assertion: Same rresp for all beats in burst
property same_rresp_in_burst;
    logic [1:0] first_rresp;
    @(posedge aclk) disable iff (!aresetn)
        (rvalid && rready && !rlast, first_rresp = rresp) |->
        (rvalid && rready [->1:$] until rlast) |-> (rresp == first_rresp);
endproperty

```

---

**Related Sections:** - Section 3.4: Read Address Channel (AR) - Section 3.2: Write Data Channel (W) - contrast with R interleaving - Section 2.5: ID Management - Section 2.6: Width Conversion - Section 2.8: Response Routing - Section 2.9: Error Handling

**Next:** [Chapter 4: Usage Examples](#)

## Channel-Specific Masters (Phase 2 Feature)

### Overview

**Phase 2 Key Feature:** Support for channel-specific AXI4 masters that only implement needed channels (write-only, read-only, or full).

**Why This Matters:** - Real hardware often has dedicated read or write masters - Full 5-channel interfaces waste resources for dedicated masters - Matches actual accelerator architectures (RAPIDS, STREAM) - Reduces port count, logic, and synthesis time

### The Problem

#### Traditional AXI4 Crossbar Generation:

All masters get all 5 AXI4 channels, even if they only perform reads or writes:

```

// x WASTEFUL: Write-only master gets unused read channels
input logic [63:0] rapids_descr_m_axi_awaddr, // ✓ USED (write address)
input logic [511:0] rapids_descr_m_axi_wdata, // ✓ USED (write data)
output logic [7:0] rapids_descr_m_axi_bid, // ✓ USED (write response)
input logic [63:0] rapids_descr_m_axi_araddr, // x UNUSED (read address)
output logic [511:0] rapids_descr_m_axi_rdata, // x UNUSED (read data)

```

- Resource Waste:** - 50% of ports unused for dedicated masters - Width converters instantiated for unused directions - Internal crossbar wiring for unused channels
- Verification coverage for unused paths

### *The Solution: channels Field*

#### **CSV Configuration Enhancement:**

Add channels field to ports.csv:

```
port_name,direction,protocol,channels,prefix,data_width,addr_width,id_width,base_addr,addr_range
rapids_descr_wr,master,axi4,wr,rapids_descr_m_axi_,512,64,8,N/A,N/A
rapids_sink_wr,master,axi4,wr,rapids_sink_m_axi_,512,64,8,N/A,N/A
rapids_src_rd,master,axi4,rd,rapids_src_m_axi_,512,64,8,N/A,N/A
stream_master,master,axi4,rw,stream_m_axi_,512,64,8,N/A,N/A
cpu_master,master,axi4,rw,cpu_m_axi_,64,32,4,N/A,N/A
```

- Channel Values:** - **rw** - Full read/write master (all 5 channels: AW, W, B, AR, R) -
- wr** - Write-only master (3 channels: AW, W, B) - **rd** - Read-only master (2 channels: AR, R)

- Backward Compatibility:** - channels field is optional - Defaults to rw if not specified - Existing CSV files work unchanged

### *Generated RTL Differences*

#### **Write-Only Master (channels=wr):**

```
// Master port: rapids_descr_wr (512b AXI4 [WR], prefix:
rapids_descr_m_axi_)
// Write Address Channel
 logic [63:0] rapids_descr_m_axi_awaddr,
 logic [7:0] rapids_descr_m_axi_awid,
 logic [7:0] rapids_descr_m_axi_awlen,
 logic [2:0] rapids_descr_m_axi_awsize,
 logic [1:0] rapids_descr_m_axi_awburst,
 logic rapids_descr_m_axi_awlock,
 logic [3:0] rapids_descr_m_axi_awcache,
 logic [2:0] rapids_descr_m_axi_awprot,
 logic rapids_descr_m_axi_awvalid,
 logic rapids_descr_m_axi_awready,
// Write Data Channel
 logic [511:0] rapids_descr_m_axi_wdata,
 logic [63:0] rapids_descr_m_axi_wstrb,
 logic rapids_descr_m_axi_wlast,
 logic rapids_descr_m_axi_wvalid,
 logic rapids_descr_m_axi_wready,
// Write Response Channel
```

```

output logic [7:0]      rapids_descr_m_axi_bid,
output logic [1:0]      rapids_descr_m_axi_bresp,
output logic          rapids_descr_m_axi_bvalid,
input logic           rapids_descr_m_axi_bready

// ✓ NO READ CHANNELS (araddr, arid, rdata, etc.)

```

### **Read-Only Master (channels=rd):**

```

// Master port: rapids_src_rd (512b AXI4 [RD], prefix:
rapids_src_m_axi_)
// Read Address Channel
input logic [63:0]      rapids_src_m_axi_araddr,
input logic [7:0]       rapids_src_m_axi_arid,
input logic [7:0]       rapids_src_m_axi_arlen,
input logic [2:0]       rapids_src_m_axi_arsize,
input logic [1:0]       rapids_src_m_axi_arburst,
input logic           rapids_src_m_axi_arlock,
input logic [3:0]       rapids_src_m_axi_arcache,
input logic [2:0]       rapids_src_m_axi_arprot,
input logic           rapids_src_m_axi_arvalid,
output logic          rapids_src_m_axi_arready,
// Read Data Channel
output logic [511:0]    rapids_src_m_axi_rdata,
output logic [7:0]       rapids_src_m_axi_rid,
output logic [1:0]       rapids_src_m_axi_rresp,
output logic           rapids_src_m_axi_rlast,
output logic           rapids_src_m_axi_rvalid,
input logic            rapids_src_m_axi_rready

// ✓ NO WRITE CHANNELS (awaddr, awid, wdata, bid, etc.)

```

### **Full Master (channels=rw):**

```

// Master port: stream_master (512b AXI4, prefix: stream_m_axi_)
// Write Address Channel
input logic [63:0]      stream_m_axi_awaddr,
// ... (all write channel signals)
// Read Address Channel
input logic [63:0]      stream_m_axi_araddr,
// ... (all read channel signals)

// ✓ ALL 5 CHANNELS PRESENT

```

### *Width Converter Optimization*

### **Separate Write and Read Converters:**

For channel-specific masters, width converters are only instantiated for needed directions:

#### Write-Only Master (512b → 512b crossbar):

```
// Width Converter (WRITE) - Master 0: rapids_descr_wr [WR]
// No converter needed - direct connection (same width)
assign xbar_m_awaddr[0] = rapids_descr_m_axi_awaddr;
assign xbar_m_awid[0] = rapids_descr_m_axi_awid;
// ... (write channel direct assignments)

// ✓ NO READ CONVERTER INSTANCE (master doesn't have read channels)
```

#### Read-Only Master (512b → 512b crossbar):

```
// Width Converter (READ) - Master 2: rapids_src_rd [RD]
// No converter needed - direct connection (same width)
assign xbar_m_araddr[2] = rapids_src_m_axi_araddr;
assign xbar_m_arid[2] = rapids_src_m_axi_arid;
// ... (read channel direct assignments)

// ✓ NO WRITE CONVERTER INSTANCE (master doesn't have write channels)
```

#### Full Master with Width Mismatch (64b → 512b crossbar):

```
// Width Converter (WRITE) - Master 4: cpu_master
axi4_dwidth_converter_wr #(
    .S_AXI_DATA_WIDTH(64),
    .M_AXI_DATA_WIDTH(512),
    // ...
) u_wconv_m4_wr (
    .aclk          (aclk),
    .aresetn      (aresetn),
    // Slave interface (64b external)
    .s_axi_awaddr (cpu_m_axi_awaddr),
    // Master interface (512b crossbar)
    .m_axi_awaddr (xbar_m_awaddr[4]),
    // ...
);

// Width Converter (READ) - Master 4: cpu_master
axi4_dwidth_converter_rd #(
    .S_AXI_DATA_WIDTH(64),
    .M_AXI_DATA_WIDTH(512),
    // ...
) u_wconv_m4_rd (
    .aclk          (aclk),
    .aresetn      (aresetn),
    // Slave interface (64b external)
    .s_axi_araddr (cpu_m_axi_araddr),
```

```

    // Master interface (512b crossbar)
    .m_axi_araddr (xbar_m_araddr[4]),
    // ...
);

// ✓ BOTH CONVERTERS (full master with width mismatch)

```

### *Direct Connection Optimization*

#### **Channel-Aware Wiring:**

When no width conversion is needed, only existing channels are wired:

#### **Write-Only Master Direct Connection:**

```

// Master 0: rapids_descr_wr [WR] (direct connection)

// Write channels only
assign xbar_m_awaddr[0] = rapids_descr_m_axi_awaddr;
assign xbar_m_awid[0] = rapids_descr_m_axi_awid;
assign xbar_m_awlen[0] = rapids_descr_m_axi_awlen;
// ... (all write channel signals)

assign rapids_descr_m_axi_bid = xbar_m_bid[0];
assign rapids_descr_m_axi_bresp = xbar_m_bresp[0];
assign rapids_descr_m_axi_bvalid = xbar_m_bvalid[0];
assign xbar_m_bready[0] = rapids_descr_m_axi_bready;

// ✓ NO READ CHANNEL ASSIGNMENTS (signals don't exist)

```

#### **Read-Only Master Direct Connection:**

```

// Master 2: rapids_src_rd [RD] (direct connection)

// Read channels only
assign xbar_m_araddr[2] = rapids_src_m_axi_araddr;
assign xbar_m_arid[2] = rapids_src_m_axi_arid;
assign xbar_m_arlen[2] = rapids_src_m_axi_arlen;
// ... (all read channel signals)

assign rapids_src_m_axi_rdata = xbar_m_rdata[2];
assign rapids_src_m_axi_rid = xbar_m_rid[2];
assign rapids_src_m_axi_rrresp = xbar_m_rrresp[2];
assign rapids_src_m_axi_rvalid = xbar_m_rvalid[2];
assign xbar_m_rready[2] = rapids_src_m_axi_rready;

// ✓ NO WRITE CHANNEL ASSIGNMENTS (signals don't exist)

```

## *Resource Savings*

### **Example: 3 Masters, 2 Slaves, All Channel-Specific**

#### **Configuration:**

```
rapids_descr_wr,master,axi4,wr,rapids_descr_m_axi_,512,64,8,N/A,N/A  
rapids_sink_wr,master,axi4,wr,rapids_sink_m_axi_,512,64,8,N/A,N/A  
rapids_src_rd,master,axi4,rd,rapids_src_m_axi_,512,64,8,N/A,N/A
```

#### **Port Count Comparison:**

| Master              | Traditional<br>(rw) | Channel-<br>Specific | Reduction   |
|---------------------|---------------------|----------------------|-------------|
| rapids_descr_<br>wr | 61 signals          | 37 signals           | -39%        |
| rapids_sink_wr      | 61 signals          | 37 signals           | -39%        |
| rapids_src_rd       | 61 signals          | 24 signals           | -61%        |
| <b>Total</b>        | <b>183 signals</b>  | <b>98 signals</b>    | <b>-46%</b> |

**Logic Savings:** - No unused width converter instances - No unused crossbar channel wiring - Reduced multiplexer/demultiplexer logic - Faster synthesis and place & route

#### *Implementation Details*

#### **PortSpec Class Enhancement:**

```
@dataclass  
class PortSpec:  
    port_name: str  
    direction: str  
    protocol: str  
    channels: str = 'rw' # NEW: Default to full read/write  
    prefix: str = ''  
    data_width: int = 0  
    # ... other fields  
  
    def has_write_channels(self) -> bool:  
        """True if this port has write channels (AW, W, B)"""  
        return self.channels in ['rw', 'wr']  
  
    def has_read_channels(self) -> bool:  
        """True if this port has read channels (AR, R)"""  
        return self.channels in ['rw', 'rd']
```

## CSV Validation:

```
# Validate channels value
if channels not in ['rw', 'rd', 'wr']:
    print(f" WARNING: Invalid channels '{channels}' for {port_name}, defaulting to 'rw'")
    channels = 'rw'
```

## Conditional Generation:

```
# Generate port signals based on channel type
if port.has_write_channels():
    port_str += generate_write_channels(port)

if port.has_read_channels():
    port_str += generate_read_channels(port)
```

## Usage Example

### RAPIDS-Style Configuration:

```
# example_ports_channels.csv
port_name,direction,protocol,channels,prefix,data_width,addr_width,id_
width,base_addr,addr_range
rapids_descr_wr,master,axi4,wr,rapids_descr_m_axi_,512,64,8,N/A,N/A
rapids_sink_wr,master,axi4,wr,rapids_sink_m_axi_,512,64,8,N/A,N/A
rapids_src_rd,master,axi4,rd,rapids_src_m_axi_,512,64,8,N/A,N/A
stream_master,master,axi4,rw,stream_m_axi_,512,64,8,N/A,N/A
cpu_master,master,axi4,rw,cpu_m_axi_,64,32,4,N/A,N/A
apb_periph0,slave,apb,rw,apb0_,32,32,N/A,0x00000000,0x00010000
ddr_controller,slave,axi4,rw,ddr_s_axi_,512,64,8,0x80000000,0x80000000
sram_buffer,slave,axi4,rw,sram_s_axi_,512,48,8,0x40000000,0x10000000
```

## Generate:

```
python3 bridge_csv_generator.py \
--ports example_ports_channels.csv \
--connectivity example_connectivity_channels.csv \
--name bridge_channels_demo \
--output /tmp/bridge_test/
```

## Verification:

```
# Verify write-only master has NO read channels
grep -c "rapids_descr_m_axi_ar\|rapids_descr_m_axi_r[^e]" \
bridge_channels_demo.sv
# Output: 0 ✓
```

```
# Verify read-only master has NO write channels
grep -c "rapids_src_m_axi_aw\|rapids_src_m_axi_w[^a]\|"
```

```
rapids_src_m_axi_b" bridge_channels_demo.sv  
# Output: 0 ✓
```

### *Benefits Summary*

**Resource Efficiency:** - 40-60% fewer ports for dedicated masters - Only necessary width converters instantiated - Reduced internal crossbar wiring - Faster synthesis and smaller netlists

**Design Clarity:** - CSV clearly shows master capabilities - Generated RTL matches actual hardware - Easier verification (no unused paths) - Self-documenting configuration

**Real-World Applicability:** - Matches RAPIDS accelerator architecture - Matches STREAM datapath architecture - Common pattern in custom SoCs - FPGA resource optimization

---

**Next:** [2.5 - Converter Insertion](#)

## **Generated RTL Module Structure**

### *Overview*

The Bridge generator creates complete SystemVerilog crossbar modules from CSV configuration files. Each generated module contains all necessary logic for address decoding, arbitration, data multiplexing, and response routing across the specified master-slave topology.

**Generated Module Naming:** bridge\_axi4\_flat\_<M>x<S> where M = number of masters, S = number of slaves

### *Example Configurations*

**2x2 Configuration:** 2 masters connecting to 2 slaves



## Bridge 2x2 Block Diagram

**1x4 Configuration:** 1 master connecting to 4 slaves



*Bridge 1x4 Block Diagram*

## Module Organization

### Top-Level Structure:

```
module bridge_axi4_flat_2x2 #(
    parameter int NUM_MASTERS = 2,
    parameter int NUM_SLAVES = 2,
    parameter int DATA_WIDTH = 32,
    parameter int ADDR_WIDTH = 32,
    parameter int ID_WIDTH = 4,
    parameter int STRB_WIDTH = 4
) (
    // Clock and reset
    input logic aclk,
    input logic aresetn,

    // Master-side interfaces (slave ports on bridge)
    // Array of AXI4 signals [NUM_MASTERS]
    input logic [ADDR_WIDTH-1:0] s_axi_awaddr [NUM_MASTERS],
    input logic [ID_WIDTH-1:0]   s_axi_awid  [NUM_MASTERS],
    // ... all AW, W, B, AR, R channel signals

    // Slave-side interfaces (master ports on bridge)
    // Array of AXI4 signals [NUM_SLAVES]
    output logic [ADDR_WIDTH-1:0] m_axi_awaddr [NUM_SLAVES],
    output logic [ID_WIDTH-1:0]   m_axi_awid  [NUM_SLAVES],
    // ... all AW, W, B, AR, R channel signals
);
```

### Key Organizational Principles:

#### 1. Packed Arrays for Ports

- Master signals: `s_axi_*[NUM_MASTERS]`
- Slave signals: `m_axi_*[NUM_SLAVES]`
- Enables scalable, parameterized instantiation

#### 2. Modular Internal Organization

- Address decode logic (AW, AR separate)
- Per-slave arbitration FSMs (AW, AR independent)
- Data multiplexing (W follows AW, R follows AR)
- Response demultiplexing (B, R route by ID)

#### 3. Clear Signal Naming Convention

- Request matrices: `aw_request_matrix[s][m], ar_request_matrix[s][m]`
- Grant matrices: `aw_grant_matrix[s][m], ar_grant_matrix[s][m]`
- Grant tracking: `aw_last_grant[s], ar_last_grant[s]`

- Grant state: aw\_grant\_active[s], ar\_grant\_active[s]

### *Internal Structure Breakdown*

#### 1. Address Decode Logic

```

//=====
=====

// Write Address Decode (AW channel)
//=====

logic [NUM_MASTERS-1:0] aw_request_matrix [NUM_SLAVES];

always_comb begin
    // Initialize all write address requests to zero
    for (int s = 0; s < NUM_SLAVES; s++) begin
        aw_request_matrix[s] = '0;
    end

    // Decode AWADDR to slave for each master
    for (int m = 0; m < NUM_MASTERS; m++) begin
        if (s_axi_awvalid[m]) begin
            // Check master's address against all slave ranges
            if (s_axi_awaddr[m] < 32'h10000000)
                aw_request_matrix[0][m] = 1'b1; // Slave 0

            if (s_axi_awaddr[m] >= 32'h10000000 &&
                s_axi_awaddr[m] < 32'h20000000)
                aw_request_matrix[1][m] = 1'b1; // Slave 1
        end
    end
end

```

**Purpose:** - Examines each master's address (AWADDR or ARADDR) - Compares against slave address ranges - Generates request bit per master-slave pair - Separate decode for write (AW) and read (AR) channels

#### 2. Arbiter FSMs (Per-Slave, Per-Direction)

Each slave has two independent arbiter FSMs: - **AW Arbiter** - Arbitrates write address requests - **AR Arbiter** - Arbitrates read address requests

**Total FSM Count:**  $2 \times \text{NUM\_SLAVES}$

See [Chapter 3.2 - Arbiter FSMs](#) for detailed FSM documentation.

### 3. Data Multiplexing

#### W Channel (Write Data):

```
// W channel follows AW grant (locked until WLAST)
generate
    for (genvar s = 0; s < NUM_SLAVES; s++) begin : gen_w_mux
        always_comb begin
            m_axi_wdata[s] = '0;
            m_axi_wstrb[s] = '0;
            m_axi_wlast[s] = 1'b0;
            m_axi_wvalid[s] = 1'b0;

            // Multiplex granted master's W signals
            for (int m = 0; m < NUM_MASTERS; m++) begin
                if (aw_grant_matrix[s][m]) begin
                    m_axi_wdata[s] = s_axi_wdata[m];
                    m_axi_wstrb[s] = s_axi_wstrb[m];
                    m_axi_wlast[s] = s_axi_wlast[m];
                    m_axi_wvalid[s] = s_axi_wvalid[m];
                end
            end
        end
    end
endgenerate
```

#### R Channel (Read Data):

```
// R channel follows AR grant (locked until RLAST)
generate
    for (genvar s = 0; s < NUM_SLAVES; s++) begin : gen_r_demux
        always_comb begin
            for (int m = 0; m < NUM_MASTERS; m++) begin
                s_axi_rdata[m] = '0;
                s_axi_rid[m] = '0;
                s_axi_rrresp[m] = '0;
                s_axi_rlast[m] = 1'b0;
                s_axi_rvalid[m] = 1'b0;

                if (ar_grant_matrix[s][m]) begin
                    s_axi_rdata[m] = m_axi_rdata[s];
                    s_axi_rid[m] = m_axi_rid[s];
                    s_axi_rrresp[m] = m_axi_rrresp[s];
                    s_axi_rlast[m] = m_axi_rlast[s];
                    s_axi_rvalid[m] = m_axi_rvalid[s];
                end
            end
        end
    end
endgenerate
```

## 4. Response Demultiplexing

### B Channel (Write Response):

```
// B channel routed by ID (supports out-of-order completion)
// TODO Phase 2: Use ID lookup table instead of grant matrix
```

Currently follows AW grant matrix, but should use ID-based routing for true out-of-order support.

### Scalability Characteristics

### Module Complexity vs Configuration:

| Configuration | FSM Count | Mux/Demux Count | Address Decoders | Lines of RTL |
|---------------|-----------|-----------------|------------------|--------------|
| 1×1           | 2         | 4               | 2                | ~400         |
| 2×2           | 4         | 8               | 2                | ~600         |
| 4×4           | 8         | 16              | 2                | ~900         |
| 8×8           | 16        | 32              | 2                | ~1500        |

**Resource Scaling:** - **FSMs:** Linear with NUM\_SLAVES (2 per slave) - **Mux Logic:** Linear with NUM\_MASTERS × NUM\_SLAVES - **Address Decode:** Fixed (2 decoders regardless of topology) - **Synthesis Impact:** Minimal for up to 8×8 configurations

### Integration Example

### Instantiation Pattern:

```
// Instantiate generated 2x2 crossbar
bridge_axi4_flat_2x2 #(
    .NUM_MASTERS(2),
    .NUM_SLAVES(2),
    .DATA_WIDTH(32),
    .ADDR_WIDTH(32),
    .ID_WIDTH(4),
    .STRB_WIDTH(4)
) u_bridge (
    .aclk        (sys_clk),
    .aresetn    (sys_rst_n),
    // Master 0 (CPU)
    .s_axi_awaddr ({cpu_awaddr}),
    .s_axi_awid  ({cpu_awid}),
    // ... all CPU AXI4 signals
```

```

// Master 1 (DMA)
.s_axi_awaddr ({dma_awaddr}),
.s_axi_awid   ({dma_awid}),
// ... all DMA AXI4 signals

// Slave 0 (DDR)
.m_axi_awaddr ({ddr_awaddr}),
.m_axi_awid   ({ddr_awid}),
// ... all DDR AXI4 signals

// Slave 1 (SRAM)
.m_axi_awaddr ({sram_awaddr}),
.m_axi_awid   ({sram_awid}),
// ... all SRAM AXI4 signals
);

```

**Note:** Array indexing syntax varies by tool - some tools require explicit array element connections.

### *Code Organization Sections*

**Generated module contains these distinct sections:**

1. **Module Header & Parameters** (~20 lines)
  - Module declaration
  - Parameter definitions
  - Port list with packed arrays
2. **Address Decode Logic** (~50 lines per direction)
  - AW decode: AWADDR → slave request matrix
  - AR decode: ARADDR → slave request matrix
  - Combinational logic with address range checks
3. **Arbiter FSMs** (~40 lines per slave per direction)
  - AW arbiter: Round-robin with grant locking
  - AR arbiter: Independent from AW arbiter
  - Sequential logic with reset handling
4. **Channel Multiplexing** (~30 lines per slave per direction)
  - AW channel: Master → selected slave
  - W channel: Follows AW grant, locked until WLAST
  - AR channel: Master → selected slave
  - R channel: Follows AR grant, locked until RLAST
5. **Response Demultiplexing** (~30 lines per slave per direction)
  - B channel: Slave → ID-matched master (TODO: ID table)

- R channel: Slave → grant-matched master
- Handles valid/ready handshaking

## 6. Ready Signal Generation (~20 lines)

- Master AWREADY: OR of all slave grants
- Master ARREADY: OR of all slave grants
- Slave WREADY: Grant-based routing
- Slave RREADY: Grant-based routing

### *Comparison with Other Crossbar Generators*

| Feature                 | Manual RTL    | Bridge Generator    | Commercial Tools      |
|-------------------------|---------------|---------------------|-----------------------|
| <b>Configuration</b>    | Code editing  | CSV files           | GUI/scripts           |
| <b>Generation Time</b>  | Hours-days    | Seconds             | Minutes               |
| <b>Custom Prefixes</b>  | Manual        | Automatic           | Limited               |
| <b>Channel-Specific</b> | Manual        | Yes (wr/rd/rw)      | Sometimes             |
| <b>Verification</b>     | Manual        | Framework available | Usually included      |
| <b>Cost</b>             | Engineer time | Free                | \$\$\$\$-\$<br>\$\$\$ |

### *Future Enhancements*

**Phase 2 Features (Planned):** - Transaction ID tracking with distributed RAM - True out-of-order write/read completion - B/R response routing via ID lookup tables - Performance monitoring hooks

**Phase 3 Features (Planned):** - Quality-of-Service (QoS) arbitration - Slave-side data width conversion - Optional pipeline registers for timing closure - Configurable arbitration algorithms (weighted, priority-based)

---

**See Also:** - [1.1 - Introduction](#) - Bridge overview and features - [2.4 - Channel-Specific Masters](#) - wr/rd/rw configuration - [3.2 - Arbiter FSMs](#) - Detailed FSM documentation

**Source Code:** - `rtl/bridge_axi4_flat_2x2.sv` - Example 2×2 crossbar (reference implementation)  
- `rtl/bridge_axi4_flat_4x4.sv` - Example 4×4 crossbar (larger scale)

## Bridge Arbiter Finite State Machines

### *Overview*

The Bridge AXI4 crossbar implements independent per-slave arbitration using simple 2-state finite state machines (FSMs). Each slave has two dedicated arbiters:

- **AW Arbiter** - Write Address channel arbitration
- **AR Arbiter** - Read Address channel arbitration

**Total FSM Count:**  $2 \times \text{NUM\_SLAVES}$

These arbiters provide fair, round-robin access to slave resources, preventing master starvation while maintaining simple, predictable behavior.

### *AW Channel Arbiter FSM*

**Purpose:** Arbitrate write address requests from multiple masters to a single slave

**States:** 2 (IDLE, GRANT\_ACTIVE)

**Algorithm:** Round-robin with grant locking



## *AW Arbitter FSM*

### **State Descriptions:**

**IDLE State:** - **Entry Actions:** -  $\text{aw\_grant\_matrix}[s] = 0$  - No grant issued -  $\text{aw\_grant\_active}[s] = 0$  - Arbiter inactive - **Behavior:** - Monitor  $\text{aw\_request\_matrix}[s]$  for master requests - Track round-robin priority via  $\text{aw\_last\_grant}[s]$  - Remain in IDLE until at least one master requests access

**GRANT\_ACTIVE State:** - **Entry Actions:** -  $\text{aw\_grant\_matrix}[s][m] = 1$  - Issue grant to winning master -  $\text{aw\_grant\_active}[s] = 1$  - Mark arbiter as active - **Behavior:** - Hold grant until AW handshake completes (AWVALID && AWREADY) - W channel data multiplexing follows AW grant - B channel response uses ID table (not grant-based)

### **State Transitions:**

| From         | To           | Condition                                                               | Description                           |
|--------------|--------------|-------------------------------------------------------------------------|---------------------------------------|
| IDLE         | GRANT_ACTIVE | $\backslash  $<br>$\text{aw\_request\_matrix}[s]$<br>$\backslash   > 0$ | At least one master requesting access |
| GRANT_ACTIVE | IDLE         | $\text{m\_axi\_awvalid}[s] \&\&$<br>$\text{m\_axi\_awready}[s]$         | AW handshake complete                 |

| From | To | Condition | Description |
|------|----|-----------|-------------|
| TIVE |    |           |             |

### *AR Channel Arbiter FSM*

**Purpose:** Arbitrate read address requests from multiple masters to a single slave

**States:** 2 (IDLE, GRANT\_ACTIVE)

**Algorithm:** Round-robin with grant locking (same as AW)



### *AR Arbiter FSM*

#### **State Descriptions:**

**IDLE State:** - **Entry Actions:** - `ar_grant_matrix[s] = 0` - No grant issued - `ar_grant_active[s] = 0` - Arbiter inactive - **Behavior:** - Monitor `ar_request_matrix[s]` for master requests - Track round-robin priority via `ar_last_grant[s]` - Independent from AW arbiter state

**GRANT\_ACTIVE State:** - **Entry Actions:** - `ar_grant_matrix[s][m] = 1` - Issue grant to winning master - `ar_grant_active[s] = 1` - Mark arbiter as active - **Behavior:** - Hold grant until AR handshake completes (`ARVALID && ARREADY`) - R

channel data multiplexing follows AR grant - R channel response uses ID table for routing

### State Transitions:

| From             | To               | Condition                               | Description                            |
|------------------|------------------|-----------------------------------------|----------------------------------------|
| IDLE             | GRANT_A<br>CTIVE | \ <br>ar_request_matrix[s]<br>\  > 0    | At least one master<br>requesting read |
| GRANT_AC<br>TIVE | IDLE             | m_axi_arvalid[s] &&<br>m_axi_arready[s] | AR handshake complete                  |

### *Round-Robin Arbitration Algorithm*

#### Algorithm Description:

Both AW and AR arbiters use the same fair round-robin arbitration algorithm:

Pseudocode:

1. Start search from  $(\text{last_grant}[s] + 1) \% \text{NUM_MASTERS}$
2. Search cyclically through all masters
3. Select first master with pending request
4. Update  $\text{last_grant}[s] = \text{winning\_master}$
5. Issue grant

#### Example with 4 Masters:

Initial state:  $\text{last_grant}[0] = 0$

Request Pattern:

```

Master 0: request
Master 1: request
Master 2: no request
Master 3: request

```

Arbitration Sequence:

```

Grant 1: Master 1 (start from master 1, first requester)
         last_grant = 1

```

```

Grant 2: Master 3 (start from master 2, find master 3)
         last_grant = 3

```

```

Grant 3: Master 0 (start from master 0, first requester)
         last_grant = 0

```

```

Grant 4: Master 1 (start from master 1, first requester)
         last_grant = 1

```

## Fairness Properties:

1. **No Starvation** - Every requesting master will eventually receive grant
2. **Priority Rotation** - Priority shifts after each grant
3. **Predictable Latency** - Maximum wait time = (NUM\_MASTERS - 1) × grant\_duration
4. **Equal Opportunity** - All masters treated equally

## Grant Locking Mechanism

**Purpose:** Prevent grant changes mid-transaction

### AW Grant Locking:

```
// AW grant locked until address handshake completes
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        aw_grant_active[s] <= 1'b0;
        aw_grant_matrix[s] <= '0;
    end else begin
        if (aw_grant_active[s]) begin
            // GRANT_ACTIVE state: Hold grant until handshake
            if (m_axi_awvalid[s] && m_axi_awready[s]) begin
                aw_grant_active[s] <= 1'b0; // Return to IDLE
                aw_grant_matrix[s] <= '0;
            end
        end else if (!aw_request_matrix[s]) begin
            // IDLE state: Arbitrate if requests pending
            aw_grant_matrix[s] <=
round_robin_select(aw_request_matrix[s]);
            aw_grant_active[s] <= 1'b1; // Enter GRANT_ACTIVE
        end
    end
end
end
```

### AR Grant Locking:

```
// AR grant locked until address handshake completes
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        ar_grant_active[s] <= 1'b0;
        ar_grant_matrix[s] <= '0;
    end else begin
        if (ar_grant_active[s]) begin
            // GRANT_ACTIVE state: Hold grant until handshake
            if (m_axi_arvalid[s] && m_axi_arready[s]) begin
                ar_grant_active[s] <= 1'b0; // Return to IDLE
                ar_grant_matrix[s] <= '0;
            end
        end
    end
end
end
```

```

        end else if (|ar_request_matrix[s]) begin
            // IDLE state: Arbitrate if requests pending
            ar_grant_matrix[s] <=
round_robin_select(ar_request_matrix[s]);
            ar_grant_active[s] <= 1'b1; // Enter GRANT_ACTIVE
        end
    end
end

```

### Why Grant Locking Matters:

1. **Protocol Compliance** - AXI4 spec requires stable AWID/ARID during handshake
2. **Data Integrity** - W channel must follow granted master's AW request
3. **Response Routing** - B/R channels need consistent transaction tracking
4. **Simplicity** - Single grant active per slave prevents mux conflicts

### *Independent Read/Write Arbitration*

**Key Design Feature:** AW and AR arbiters operate completely independently

### Benefits:

1. **No Head-of-Line Blocking:**
  - Read requests don't wait for write completion
  - Write requests don't wait for read completion
2. **Parallel Operation:**
  - Master 0 can write to Slave 0 (AW path)
  - Master 1 can read from Slave 0 (AR path)
  - Both transactions proceed simultaneously
3. **Resource Efficiency:**
  - Full utilization of read/write bandwidth
  - No artificial serialization

### Example Scenario:

Time T0: Master 0 issues write to Slave 0

- AW arbiter grants to Master 0
- AW\_GRANT\_ACTIVE = 1
- W data follows

Time T1: Master 1 issues read to Slave 0 (during Master 0 write)

- AR arbiter grants to Master 1 (independent!)
- AR\_GRANT\_ACTIVE = 1
- R data returns

Result: Read and write happen in parallel - no blocking

#### *FSM Instance Breakdown by Configuration*

**2×2 Configuration (2 masters, 2 slaves):** - Slave 0 AW Arbiter: 1 FSM - Slave 0 AR Arbiter: 1 FSM - Slave 1 AW Arbiter: 1 FSM - Slave 1 AR Arbiter: 1 FSM - **Total: 4 FSMs**

**4×4 Configuration (4 masters, 4 slaves):** - 4 slaves × 2 arbiters per slave = **8 FSMs**

**8×8 Configuration (8 masters, 8 slaves):** - 8 slaves × 2 arbiters per slave = **16 FSMs**

**Scalability:** - FSM count scales linearly with NUM\_SLAVES - Arbitration complexity scales with NUM\_MASTERS (search time) - Synthesis impact minimal for up to 16×16 configurations

#### *Performance Characteristics*

##### **Latency:**

- **Best Case:** 1 clock cycle (IDLE → GRANT\_ACTIVE → IDLE with immediate handshake)
- **Worst Case:** (NUM\_MASTERS) clock cycles (round-robin search + contention)
- **Average Case:** ~(NUM\_MASTERS / 2) clock cycles

##### **Throughput:**

- **Back-to-Back Grants:** Possible if different masters (no wait)
- **Same Master Repeated:** 1 cycle minimum between grants (FSM state change)

##### **Fairness:**

- **Guaranteed:** Every master gets grant within (NUM\_MASTERS - 1) arbitration cycles
- **No Priority:** All masters treated equally (can be extended for QoS)

#### *Comparison with Other Crossbar Arbiters*

| Feature       | Bridge Arbiter | APB Crossbar    | Commercial Tools |
|---------------|----------------|-----------------|------------------|
| <b>States</b> | 2 (IDLE,       | 1 (passthrough) | 3-5 (complex)    |

| Feature                 | Bridge Arbiter        | APB Crossbar     | Commercial Tools        |
|-------------------------|-----------------------|------------------|-------------------------|
|                         | GRANT_ACTIVE)         |                  |                         |
| <b>Algorithm</b>        | Round-robin           | N/A (APB is 1:1) | Priority, QoS, weighted |
| <b>Grant Locking</b>    | Yes (until handshake) | N/A              | Burst-aware locking     |
| <b>Read/Write Indep</b> | Yes (2 FSMs/slave)    | N/A              | Yes                     |
| <b>Complexity</b>       | Low                   | Minimal          | High                    |
| <b>Predictability</b>   | High (round-robin)    | N/A              | Medium (priority-based) |

### Simplicity Trade-off:

- **Advantages:**
  - Easy to verify (2-state FSM)
  - Predictable latency
  - Fair resource allocation
- **Limitations (future enhancements):**
  - No Quality-of-Service (QoS) support
  - No priority levels
  - No weighted arbitration

### Future Enhancements (Phase 3+)

#### QoS Support:

```
// Master QoS priority field (AXI4 optional)


```

#### Weighted Arbitration:

```
// Master weight configuration
parameter int MASTER_WEIGHTS [NUM_MASTERS] = '{1, 2, 1, 4};

// Grant frequency proportional to weight
// Master 3 gets 4x more grants than Master 0
```

---

### Burst-Aware Locking:

```
// Lock grant until entire burst completes
// Currently: Lock until address handshake
// Enhanced: Lock until WLAST (write) or RLAST (read)
```

---

**See Also:** - [1.1 - Introduction](#) - Bridge overview - [3.1 - Module Structure](#) - Generated RTL organization - [3.3 - Crossbar Core](#) - Internal crossbar instantiation

**Reference:** - ARM AMBA AXI4 Specification (IHI 0022E) - Section A7 (Arbitration)

---

**Next:** [3.3 - Crossbar Core](#)

## Chapter 3.7: Bridge ID Tracking for Response Routing

**Version:** 1.0 **Last Updated:** 2025-11-10 **Status:** Complete - Fully Implemented

---

### Overview

Bridge ID tracking is a critical component of the bridge crossbar architecture that enables correct response routing in multi-master configurations. It solves the fundamental problem of routing out-of-order slave responses back to the correct requesting master.

### The Problem

In a multi-master crossbar, transaction IDs alone are insufficient for response routing because:

1. **Multiple masters may use the same transaction ID**
  - Master 0 sends TID=5 to Slave 0
  - Master 1 sends TID=5 to Slave 1
  - When TID=5 response arrives, which master should receive it?

2. **Out-of-order responses from slaves**
  - Master 0 sends TID=5 to Slave 0, then TID=6 to Slave 1
  - Slave 1 responds first with TID=6
  - Slave 0 responds second with TID=5
  - Both responses must route back to Master 0
3. **Address-decode signals are not persistent**
  - Address decode happens on AW/AR channels (request path)
  - B/R responses arrive later, potentially in different order
  - Cannot use stale address decode for response routing

### The Solution: Bridge ID

Each master is assigned a unique **Bridge ID** (0, 1, 2, ..., N-1). When a request is issued:

1. Master adapter outputs constant `bridge_id_aw`/`bridge_id_ar` alongside transaction
2. Crossbar routes request to target slave based on address decode
3. Slave adapter stores (TID, `bridge_id`) mapping in CAM or FIFO
4. When slave responds with TID, adapter looks up original `bridge_id`
5. Crossbar routes response to master matching `bridge_id`

**Key Insight:** Bridge ID identifies the source master, enabling correct response routing regardless of transaction ID reuse or response ordering.

---

## Architecture Components

### 1. Master Adapters - Bridge ID Generation

Each master adapter has a unique BRIDGE\_ID parameter and outputs constant `bridge_id` signals.

**Parameters:**

```
module cpu_adapter #(
    parameter BRIDGE_ID = 0,                      // Unique master index (0,
1, 2, ...)                                         // $clog2(NUM_MASTERS)
    parameter BRIDGE_ID_WIDTH = 1,                  // ...
    parameter NUM_SLAVES = 2,
    ...
) (
    input logic aclk,
```

```



```

**Signal Flow:** - cpu\_adapter: BRIDGE\_ID=0 → bridge\_id\_aw=0, bridge\_id\_ar=0 -  
 - dma\_adapter: BRIDGE\_ID=1 → bridge\_id\_aw=1, bridge\_id\_ar=1 - accel\_adapter:  
 BRIDGE\_ID=2 → bridge\_id\_aw=2, bridge\_id\_ar=2

## 2. Slave Adapters - Transaction Tracking

Slave adapters implement CAM or FIFO tracking to store and retrieve bridge\_id based on slave configuration.

### Port Interface:

```

module ddr_adapter #(
    parameter int ID_WIDTH = 4,
    parameter int BRIDGE_ID_WIDTH = 2 // log2(num_masters)
) (
    input logic aclk,
    input logic aresetn,
// Crossbar interface (from crossbar)
    input logic [ID_WIDTH-1:0] xbar_ddr_axi_awid,
    input logic xbar_ddr_axi_awvalid,
    output logic xbar_ddr_axi_awready,
...
    output logic [ID_WIDTH-1:0] xbar_ddr_axi_bid,
    output logic xbar_ddr_axi_bvalid,
    input logic xbar_ddr_axi_bready,

```

```

// Bridge ID tracking inputs (from crossbar)


```

## Implementation Modes:

### Mode A: FIFO Tracking (In-Order Slaves)

Used when enable\_ooo = false (default). Assumes slave responses arrive in same order as requests.

### Write Channel FIFO:

```

// Write Channel FIFO (In-Order)
localparam WR_FIFO_DEPTH = 16;
logic [BRIDGE_ID_WIDTH-1:0] wr_fifo [WR_FIFO_DEPTH];
logic [$clog2(WR_FIFO_DEPTH):0] wr_ptr, rd_ptr;

// Push on AW handshake
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        wr_ptr <= '0;
    end else if (xbar_ddr_axi_awvalid && xbar_ddr_axi_awready) begin
        wr_fifo[wr_ptr[$clog2(WR_FIFO_DEPTH)-1:0]] <=
xbar_bridge_id_aw;
        wr_ptr <= wr_ptr + 1'b1;
    end
end

// Pop on B response
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        rd_ptr <= '0;
        bid_bridge_id <= '0;
        bid_valid <= 1'b0;
    end else if (xbar_ddr_axi_bvalid && xbar_ddr_axi_bready) begin
        bid_bridge_id <= wr_fifo[rd_ptr[$clog2(WR_FIFO_DEPTH)-1:0]];
        bid_valid <= 1'b1;
        rd_ptr <= rd_ptr + 1'b1;
    end
end

```

```

        end else begin
            bid_valid <= 1'b0; // Only valid for one cycle
        end
    end

```

### Read Channel FIFO:

```

// Read Channel FIFO (In-Order)
localparam RD_FIFO_DEPTH = 16;
logic [BRIDGE_ID_WIDTH-1:0] rd_fifo [RD_FIFO_DEPTH];
logic [$clog2(RD_FIFO_DEPTH):0] ar_ptr, r_ptr;

// Push on AR handshake
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        ar_ptr <= '0;
    end else if (xbar_ddr_axi_arvalid && xbar_ddr_axi_arready) begin
        rd_fifo[ar_ptr[$clog2(RD_FIFO_DEPTH)-1:0]] <=
xbar_bridge_id_ar;
        ar_ptr <= ar_ptr + 1'b1;
    end
end

// Pop on R response (with RLAST)
always_ff @(posedge aclk or negedge aresetn) begin
    if (!aresetn) begin
        r_ptr <= '0;
        rid_bridge_id <= '0;
        rid_valid <= 1'b0;
    end else if (xbar_ddr_axi_rvalid && xbar_ddr_axi_rready &&
xbar_ddr_axi_rlast) begin
        rid_bridge_id <= rd_fifo[r_ptr[$clog2(RD_FIFO_DEPTH)-1:0]];
        rid_valid <= 1'b1;
        r_ptr <= r_ptr + 1'b1;
    end else begin
        rid_valid <= 1'b0;
    end
end

```

**FIFO Mode Characteristics:** - **Area:** Minimal (~32 FFs for 16-deep x 2-bit FIFO) - **Latency:** Zero (combinational pop) - **Limitation:** Requires in-order responses - **Best For:** SRAM, BRAM, simple peripherals

### Mode B: CAM Tracking (Out-of-Order Slaves)

Used when enable\_ooo = true. Supports out-of-order responses via associative lookup.

### Write Channel CAM:

```

// Write Channel CAM
bridge_cam #(
    .TAG_WIDTH(ID_WIDTH),           // Transaction ID (4-8 bits)
    .DATA_WIDTH(BRIDGE_ID_WIDTH),    // Master index
    .DEPTH(16),                    // Outstanding transactions
    .ALLOW_DUPLICATES(1),          // Mode 2: 000 support
    .PIPELINE_EVICT(0)             // Combinational for now
) u_wr_cam (
    .clk(aclk),
    .rst_n(aresetn),

// Allocate on AW (store TID → bridge_id mapping)
    .allocate(xbar_ddr_axi_awvalid && xbar_ddr_axi_awready),
    .allocate_tag(xbar_ddr_axi_awid),
    .allocate_data(xbar_bridge_id_aw),

// Deallocate on B (lookup TID → bridge_id)
    .deallocate(xbar_ddr_axi_bvalid && xbar_ddr_axi_bready),
    .deallocate_tag(xbar_ddr_axi_bid),
    .deallocate_valid(bid_valid),
    .deallocate_data(bid_bridge_id),

// Status (unconnected)
    .cam_hit(),
    .tags_empty(),
    .tags_full()
);

```

### Read Channel CAM:

```

// Read Channel CAM
bridge_cam #(
    .TAG_WIDTH(ID_WIDTH),
    .DATA_WIDTH(BRIDGE_ID_WIDTH),
    .DEPTH(16),
    .ALLOW_DUPLICATES(1),
    .PIPELINE_EVICT(0)
) u_rd_cam (
    .clk(aclk),
    .rst_n(aresetn),

// Allocate on AR
    .allocate(xbar_ddr_axi_arvalid && xbar_ddr_axi_arready),
    .allocate_tag(xbar_ddr_axi_arid),
    .allocate_data(xbar_bridge_id_ar),

// Deallocate on R (with RLAST)
    .deallocate(xbar_ddr_axi_rvalid && xbar_ddr_axi_rready &&

```

```

    xbar_ddr_axi_rlast),
    .deallocate_tag(xbar_ddr_axi_rid),
    .deallocate_valid(rid_valid),
    .deallocate_data(rid_bridge_id),

    // Status
    .cam_hit(),
    .tags_empty(),
    .tags_full()
);

```

**CAM Mode Characteristics:** - **Area:** Moderate (parallel comparators for associative lookup) - **Latency:** 1 cycle (registered lookup) - **Capability:** Full out-of-order support - **Best For:** DDR controllers, HBM, PCIe, network interfaces

### Configuration:

```

[[bridge.slaves]]
name = "ddr"
prefix = "ddr_s_axi"
enable_ooo = true # Use CAM for OOO capability

[[bridge.slaves]]
name = "sram"
prefix = "sram_s_axi"
enable_ooo = false # Use FIFO for in-order (default)

```

## 3. Crossbar - Response Routing Logic

The crossbar uses bridge\_id from slave adapters to route responses back to correct masters.

### Signal Declarations:

```

module bridge_2x2_rw_xbar
    import bridge_2x2_rw_pkg::*;
#(
    parameter int BRIDGE_ID_WIDTH = 1
) (
    input logic aclk,
    input logic aresetn,

    // Master 0 (CPU) interfaces
    input logic [BRIDGE_ID_WIDTH-1:0] cpu_bridge_id_aw,
    input logic [BRIDGE_ID_WIDTH-1:0] cpu_bridge_id_ar,
    input logic [NUM_SLAVES-1:0]      cpu_slave_select_aw,
    input logic [NUM_SLAVES-1:0]      cpu_slave_select_ar,
    output axi4_b_t                  cpu_32b_b,
    output logic                      cpu_32b_bvalid,

```

```

output axi4_r_32b_t          cpu_32b_r,
output logic                cpu_32b_rvalid,

// Master 1 (DMA) interfaces
input logic [BRIDGE_ID_WIDTH-1:0] dma_bridge_id_aw,
input logic [BRIDGE_ID_WIDTH-1:0] dma_bridge_id_ar,
input logic [NUM_SLAVES-1:0]      dma_slave_select_aw,
input logic [NUM_SLAVES-1:0]      dma_slave_select_ar,
output axi4_b_t
output logic
output axi4_r_32b_t
output logic                dma_32b_b,
                                dma_32b_bvalid,
                                dma_32b_r,
                                dma_32b_rvalid,

// Slave 0 (DDR) tracking outputs
output logic [BRIDGE_ID_WIDTH-1:0] ddr_axi_bridge_id_aw,
output logic [BRIDGE_ID_WIDTH-1:0] ddr_axi_bridge_id_ar,
input logic [BRIDGE_ID_WIDTH-1:0] ddr_axi_bid_bridge_id,
input logic                  ddr_axi_bid_valid,
input logic [BRIDGE_ID_WIDTH-1:0] ddr_axi_rid_bridge_id,
input logic                  ddr_axi_rid_valid,

// Slave 1 (SRAM) tracking outputs
output logic [BRIDGE_ID_WIDTH-1:0] sram_axi_bridge_id_aw,
output logic [BRIDGE_ID_WIDTH-1:0] sram_axi_bridge_id_ar,
input logic [BRIDGE_ID_WIDTH-1:0] sram_axi_bid_bridge_id,
input logic                  sram_axi_bid_valid,
input logic [BRIDGE_ID_WIDTH-1:0] sram_axi_rid_bridge_id,
input logic                  sram_axi_rid_valid,
...
);

```

### Request Path Routing (Uses slave\_select):

```

// Forward bridge_id to slaves (route along with AW/AR)
assign ddr_axi_bridge_id_aw =
  (cpu_slave_select_aw[0] ? cpu_bridge_id_aw : '0) |
  (dma_slave_select_aw[0] ? dma_bridge_id_aw : '0);

assign sram_axi_bridge_id_aw =
  (cpu_slave_select_aw[1] ? cpu_bridge_id_aw : '0) |
  (dma_slave_select_aw[1] ? dma_bridge_id_aw : '0);

// Ready signals use address decode (slave_select)
assign cpu_32b_awready =
  (cpu_slave_select_aw[0] ? ddr_axi_awready : '0) |
  (cpu_slave_select_aw[1] ? sram_axi_awready : '0);

```

### Response Path Routing (Uses bridge\_id matching):

```

// CPU master (BRIDGE_ID = 0) - B channel response
assign cpu_32b_b.id =
    ((ddr_axi_bid_bridge_id == 0) && ddr_axi_bid_valid ? ddr_axi_bid : '0) |
    ((sram_axi_bid_bridge_id == 0) && sram_axi_bid_valid ? sram_axi_bid : '0);

assign cpu_32b_b.resp =
    ((ddr_axi_bid_bridge_id == 0) && ddr_axi_bid_valid ? ddr_axi_bresp : '0) |
    ((sram_axi_bid_bridge_id == 0) && sram_axi_bid_valid ? sram_axi_bresp : '0);

assign cpu_32b_bvalid =
    ((ddr_axi_bid_bridge_id == 0) && ddr_axi_bid_valid ? ddr_axi_bvalid : '0) |
    ((sram_axi_bid_bridge_id == 0) && sram_axi_bid_valid ? sram_axi_bvalid : '0);

// DMA master (BRIDGE_ID = 1) - B channel response
assign dma_32b_b.id =
    ((ddr_axi_bid_bridge_id == 1) && ddr_axi_bid_valid ? ddr_axi_bid : '0) |
    ((sram_axi_bid_bridge_id == 1) && sram_axi_bid_valid ? sram_axi_bid : '0);

assign dma_32b_bvalid =
    ((ddr_axi_bid_bridge_id == 1) && ddr_axi_bid_valid ? ddr_axi_bvalid : '0) |
    ((sram_axi_bid_bridge_id == 1) && sram_axi_bid_valid ? sram_axi_bvalid : '0);

```

### Key Design Decisions:

- Request Path:** Uses slave\_select (address decode)
  - Routing decision based on address
  - Determines which slave receives request
  - bridge\_id forwarded alongside request
- Response Path:** Uses bridge\_id matching
  - Routing decision based on stored bridge\_id
  - Determines which master receives response
  - Handles out-of-order, overlapping TIDs
- Valid Gating:** bid\_valid/rid\_valid signals
  - Slave adapter asserts valid for one cycle when bridge\_id available
  - Prevents false matches when no response active

- Critical for correct multi-master operation
- 

## Signal Flow Example

### Write Transaction Flow

**Scenario:** CPU (BRIDGE\_ID=0) writes to DDR, DMA (BRIDGE\_ID=1) writes to SRAM

#### Step 1: CPU Issues Write Request

```
cpu_adapter:  
  - Decodes address → slave_select_aw[0] = 1 (DDR)  
  - Outputs bridge_id_aw = 0 (constant)  
  - Outputs AW packet: awid=5, awaddr=0x8000_0000
```

#### Step 2: Crossbar Routes to DDR

```
bridge_2x2_rw_xbar:  
  - Routes cpu AW to ddr_axi based on cpu_slave_select_aw[0]  
  - Forwards ddr_axi_bridge_id_aw = cpu_bridge_id_aw = 0
```

#### Step 3: DDR Adapter Stores Mapping

```
ddr_adapter:  
  - AW handshake: xbar_ddr_axi_awvalid && xbar_ddr_axi_awready  
  - FIFO push: wr_fifo[wr_ptr] <= xbar_bridge_id_aw (0)  
  - Associates TID=5 → bridge_id=0
```

#### Step 4: External DDR Responds

```
External DDR controller:  
  - Sends B response: bid=5, bresp=OKAY
```

#### Step 5: DDR Adapter Retrieves bridge\_id

```
ddr_adapter:  
  - B handshake: xbar_ddr_axi_bvalid && xbar_ddr_axi_bready  
  - FIFO pop: bid_bridge_id <= wr_fifo[rd_ptr] (0)  
  - Outputs: bid_bridge_id=0, bid_valid=1
```

#### Step 6: Crossbar Routes Response to CPU

```
bridge_2x2_rw_xbar:  
  - Checks: ddr_axi_bid_bridge_id == 0? YES → route to CPU  
  - Checks: ddr_axi_bid_bridge_id == 1? NO → don't route to DMA  
  - Result: cpu_32b_bvalid = 1, dma_32b_bvalid = 0
```

## Concurrent Multi-Master Scenario

**Scenario:** Both CPU and DMA access DDR simultaneously with same TID

**Setup:** - CPU sends: TID=5 to DDR (BRIDGE\_ID=0) - DMA sends: TID=5 to DDR (BRIDGE\_ID=1) - DDR responds out-of-order: DMA response first, then CPU response

### Timeline:

| Cycle | Event                                  | DDR FIFO State                 | Response Routing               |
|-------|----------------------------------------|--------------------------------|--------------------------------|
| 0     | CPU AW:<br>TID=5,<br>bridge_id<br>=0   | wr_fifo[0] = 0                 | -                              |
| 1     | DMA AW:<br>TID=5,<br>bridge_id<br>=1   | wr_fifo[0] = 0, wr_fifo[1] = 1 | -                              |
| 5     | DMA B:<br>TID=5<br>(first<br>response) | Pop wr_fifo[0] = 0             | <b>WRONG without tracking!</b> |

**Without bridge\_id tracking:** - DMA response (TID=5) would be routed to CPU (first in FIFO) - INCORRECT - breaks both transactions

**With bridge\_id tracking (FIFO mode):** - FIFO strictly enforces order: first AW → first B - Relies on DDR responding in-order - Works for in-order slaves only

**With bridge\_id tracking (CAM mode):** - CAM stores both: {TID=5, bridge\_id=0} and {TID=5, bridge\_id=1} - DDR B response: bid=5 → CAM lookup → finds TWO entries! - **Requires additional slave\_id context or strict ordering**

**Best Practice:** Avoid same TID from multiple masters to same slave. Bridge uses master-side TID remapping to guarantee uniqueness.

---

## Configuration

### TOML Configuration

Enable out-of-order tracking per slave:

```
[bridge]
name = "bridge_2x2_rw"
description = "2 master x 2 slave with mixed 000 capability"
```

```

[[bridge.masters]]
name = "cpu"
prefix = "cpu_axi_"
channels = "rw"
id_width = 4
addr_width = 32
data_width = 32

[[bridge.masters]]
name = "dma"
prefix = "dma_axi_"
channels = "rw"
id_width = 4
addr_width = 32
data_width = 32

[[bridge.slaves]]
name = "ddr"
prefix = "ddr_s_axi"
enable_ooo = true # DDR supports out-of-order (use CAM)
base_addr = 0x80000000
addr_range = 0x80000000
id_width = 4
data_width = 32

[[bridge.slaves]]
name = "sram"
prefix = "sram_s_axi"
enable_ooo = false # SRAM is in-order only (use FIFO, default)
base_addr = 0x00000000
addr_range = 0x80000000
id_width = 4
data_width = 32

```

## Generator Parameter Passing

```

# Master adapter gets master_index for BRIDGE_ID
cpu_adapter = AdapterGenerator(
    bridge_name="bridge_2x2_rw",
    master_config=cpu_master_config,
    slaves=all_slaves,
    master_index=0 # CPU is master 0 → BRIDGE_ID=0
)

dma_adapter = AdapterGenerator(
    bridge_name="bridge_2x2_rw",
    master_config=dma_master_config,
    slaves=all_slaves,
    master_index=1 # DMA is master 1 → BRIDGE_ID=1

```

```

)
# Slave adapter gets enable_ooo from config
ddr_adapter = SlaveAdapterGenerator(
    bridge_name="bridge_2x2_rw",
    slave_config=ddr_slave_config, # enable_ooo=True
    channels="rw"
)

```

---

## Resource Utilization

### Area Estimates (per slave adapter)

**FIFO Mode (In-Order):** - Write FIFO: 16 entries x BRIDGE\_ID\_WIDTH bits = 16-32 FFs - Read FIFO: 16 entries x BRIDGE\_ID\_WIDTH bits = 16-32 FFs - Pointers: 2 x 5-bit counters = 10 FFs - Total: ~40-75 FFs

**CAM Mode (Out-of-Order):** - Write CAM: 16 entries x (ID\_WIDTH + BRIDGE\_ID\_WIDTH) = ~96 FFs - Comparators: 16 parallel comparators - Read CAM: Similar - Total: ~200 FFs + comparator logic

**Crossbar Overhead:** - bridge\_id wiring: Minimal (fanout from master adapters) - Response muxes: Combinational OR-based (existing)

### Timing Impact

**FIFO Mode:** - Pop operation: Combinational (registered FIFO output) - Critical path: FIFO read → response mux → master - Typical impact: +0.1 ns

**CAM Mode:** - Lookup operation: 1 cycle (registered) - Critical path: CAM comparators → response mux → master - Typical impact: +0.3-0.5 ns

**Recommendation:** Use FIFO mode by default (faster, smaller). Only enable CAM mode for slaves that genuinely support out-of-order responses.

---

## Debugging and Verification

### Simulation Waveform Signals

#### Master Adapter:

|                                  |                         |
|----------------------------------|-------------------------|
| cpu_adapter/bridge_id_aw         | - Constant 0 for CPU    |
| cpu_adapter/bridge_id_ar         | - Constant 0 for CPU    |
| cpu_adapter/slave_select_aw[1:0] | - Address decode output |

### Slave Adapter (FIFO mode):

|                               |                                |
|-------------------------------|--------------------------------|
| ddr_adapter/xbar_bridge_id_aw | - Input from crossbar          |
| ddr_adapter/wr_fifo[0:15]     | - FIFO contents                |
| ddr_adapter/wr_ptr            | - Write pointer                |
| ddr_adapter/rd_ptr            | - Read pointer                 |
| ddr_adapter/bid_bridge_id     | - Output to crossbar           |
| ddr_adapter/bid_valid         | - Valid flag (one cycle pulse) |

### Crossbar:

|                            |                           |
|----------------------------|---------------------------|
| xbar/cpu_bridge_id_aw      | - From CPU master adapter |
| xbar/dma_bridge_id_aw      | - From DMA master adapter |
| xbar/ddr_axi_bridge_id_aw  | - To DDR slave adapter    |
| xbar/ddr_axi_bid_bridge_id | - From DDR slave adapter  |
| xbar/cpu_32b_bvalid        | - Response valid to CPU   |
| xbar/dma_32b_bvalid        | - Response valid to DMA   |

## Common Issues and Solutions

**Issue 1: Response routed to wrong master - Symptom:** Master receives response for transaction it didn't issue - **Debug:** Check bid\_bridge\_id matches master's BRIDGE\_ID - **Cause:** FIFO depth exhausted, pointers wrapped incorrectly - **Fix:** Increase FIFO depth or add overflow detection

**Issue 2: Lost responses - Symptom:** Transaction completes at slave, but master never receives response - **Debug:** Check bid\_valid signal - should pulse for one cycle - **Cause:** bid\_valid not asserted or crossbar missed pulse - **Fix:** Extend bid\_valid or add holding register

**Issue 3: CAM lookup failures - Symptom:** deallocate\_valid=0 when expected - **Debug:** Check CAM allocate/deallocate tag matching - **Cause:** TID mismatch between AW and B (slave error) - **Fix:** Add error reporting, check slave compliance

---

## Performance Implications

### Latency

**FIFO Mode:** - AW → B path: +0 cycles (FIFO push/pop combinational) - AR → R path: +0 cycles (FIFO push/pop combinational)

**CAM Mode:** - AW → B path: +1 cycle (CAM lookup registered) - AR → R path: +1 cycle (CAM lookup registered)

**Comparison:** - Simple address-decode routing: 0 cycles (but INCORRECT for multi-master) - Bridge ID FIFO tracking: 0 cycles (CORRECT for in-order slaves) - Bridge ID CAM tracking: 1 cycle (CORRECT for out-of-order slaves)

## Throughput

No impact on maximum throughput - tracking operates in parallel with data path.

## Fmax

**FIFO Mode:** Negligible impact (~1-2% reduction) **CAM Mode:** Moderate impact (~5-10% reduction due to comparators)

---

## Future Enhancements

### TID Remapping (Planned)

To avoid TID collisions when multiple masters use same TID to same slave:

Master TID → Unique Crossbar TID → Slave sees unique TID  
Response:      Unique Crossbar TID → Original Master TID

Requires additional translation tables in master/slave adapters.

### Multi-Level CAMs

For very large systems (>8 masters, >100 outstanding transactions):

Level 1: Coarse CAM (master\_id)  
Level 2: Fine CAM per master (TID)

Reduces comparison fanout, improves timing.

### Configurable Depth

Allow FIFO/CAM depth to be configurable per slave:

```
[[bridge.slaves]]  
name = "ddr"  
enable_ooo = true  
tracking_depth = 32 # Deep queue for high-throughput slaves
```

---

## Summary

Bridge ID tracking is a **fully implemented, production-ready feature** that enables correct multi-master crossbar operation:

- **Master Adapters:** Generate unique constant bridge\_id per master
- **Slave Adapters:** Store TID→bridge\_id mapping (FIFO or CAM)
- **Crossbar:** Route responses using bridge\_id matching
- **Configuration:** Per-slave enable\_ooo controls CAM vs FIFO mode
- **Resource Cost:** Minimal (40-200 FFs per slave adapter)
- **Performance:** Zero latency (FIFO) or 1-cycle (CAM)

This architecture solves the fundamental multi-master response routing problem while maintaining low area and high performance.

---

**Related Chapters:** - [3.1: Module Structure](#) - Overall RTL organization - [3.2: Arbiter FSMs](#) - Request arbitration - [3.6: Signal Routing](#) - Internal signal connections

**Related Documentation:** - [../../../../bin/BRIDGE\\_ID\\_TRACKING\\_DESIGN.md](#) - Original design document (COMPLETE) - [../../../../BRIDGE\\_ARCHITECTURE.md](#) - Overall architecture - [rtl/bridge\\_cam.sv](#) - CAM module implementation

## Bridge: AXI4 Full Crossbar Generator - Product Requirements Document

**Project:** Bridge **Version:** 2.1 **Status:**  Phase 2 Complete - Simplified Architecture with Hard Limits **Created:** 2025-10-18 **Last Updated:** 2025-11-02

---

## Executive Summary

**Bridge** is a Python-based AXI4 crossbar generator that produces simple, performant SystemVerilog RTL for connecting multiple AXI4 masters to multiple AXI4 slaves. The name follows the infrastructure theme - bridges connect different regions, enabling communication across divides, just like crossbars connect masters and slaves.

**Design Philosophy:** A simple AMBA fabric that is performant, but makes no attempt to support all features. We enforce hard limits (8-bit ID width, 64-bit address width) to eliminate unnecessary complexity, focusing only on what real hardware needs.

**Key Differentiator from Delta:** - **Delta:** AXI-Stream crossbar (streaming data, single channel, simple routing) - **Bridge:** AXI4 full crossbar (memory-mapped, 5 channels, burst support, ID-based routing)

**Key Differentiator from Commercial IP:** - **Commercial AXI4 Crossbars:** Feature-complete, support every AXI4 edge case, complex configurability - **Bridge:** Simple and performant, supports common use cases, hard limits for simplicity

**Target Use Case:** High-performance memory-mapped interconnects for multi-core processors, accelerators, and memory controllers where simplicity and performance matter more than spec compliance.

---

## Design Philosophy

**Vision:** A simple AMBA fabric that is performant, but makes no attempt to support all features.

### Core Principles

- 1. Simplicity Over Completeness** - Focus on common use cases, not edge cases - Implement what's needed for real hardware, not AXI4 spec compliance theater - Prefer straightforward implementations over complex feature sets
- 2. Performance First** - Direct connections where possible (matching widths = zero overhead) - Minimal conversion logic in critical paths - Optimize for throughput and latency, not feature count

### 3. Hard Limits for Simplicity

We enforce uniform widths on certain parameters to eliminate complexity:

| Parameter       | Hard Limit     | Rationale                                                                                       |
|-----------------|----------------|-------------------------------------------------------------------------------------------------|
| <b>ID Width</b> | 8 bits (fixed) | Eliminates ID width conversion logic. 256 unique IDs is sufficient for all realistic use cases. |

| Parameter            | Hard Limit          | Rationale                                                                                |
|----------------------|---------------------|------------------------------------------------------------------------------------------|
| <b>Address Width</b> | 64 bits (fixed)     | Eliminates address width conversion. 64-bit addressing covers all memory-mapped systems. |
| <b>Data Width</b>    | Variable (32b-512b) | Width converters ONLY for data. Commonly needed for bandwidth optimization.              |

**Why This Works:** - ID width conversion adds complexity with minimal benefit (nobody needs >256 outstanding transactions per master) - Address width conversion is rarely needed (peripheral addresses fit in 32-bit, but using 64-bit everywhere is simpler) - Data width conversion is the ONLY width conversion that provides real value (bandwidth matching)

#### 4. What We DON'T Support (By Design)

Features intentionally excluded for simplicity: - ✗ Variable ID width per port - ✗ Variable address width per port - ✗ AXI4-Lite protocol variant (use standard AXI4 with len=0) - ✗ ACE protocol extensions (cache coherency) - ✗ AXI5 features - ✗ Complete AXI4 sideband signal support (QoS, Region, User signals declared but not routed)

#### 5. What We DO Support

Core AXI4 features that matter: - ✓ Full 5-channel AXI4 protocol (AW, W, B, AR, R) - ✓ Burst transactions (INCR, WRAP, FIXED) - ✓ Out-of-order completion via transaction IDs - ✓ Multiple outstanding transactions per master - ✓ Channel-specific masters (write-only, read-only, read-write) - ✓ Data width conversion (32b ↔ 64b ↔ 128b ↔ 256b ↔ 512b) - ✓ Configurable M×N topology (1-32 masters, 1-256 slaves) - ✓ Fair round-robin arbitration per slave

#### Architecture Philosophy

**Target:** Intelligent width-aware routing, not fixed-width crossbar

#### OLD Approach (What We're Moving Away From):

Master (64b) → Upsize(64→256) → Fixed 256b Crossbar → Downsize(256→64) → Slave (64b)

- Two conversions for same-width connections (wasteful!)
- Artificial bandwidth bottleneck
- Unnecessary logic and latency

### **NEW Approach (Target Architecture):**

Master (64b) → Direct Connection → Slave (64b) [0 conversions!]  
 Master (512b) → Conv(512→64) → Slave (64b) [1 conversion]

- Per-master paths to each unique slave width it connects to
- Direct paths where widths match (zero overhead)
- Converters only where actually needed
- Router selects appropriate path based on address decode

**Result:** Minimal conversion logic, maximum performance, pragmatic simplicity.

---

### **⚠ CRITICAL: RTL Regeneration Requirements**

**ALL generated RTL files MUST be deleted and regenerated together whenever ANY generator code changes.**

**Why This Matters:** - Generated RTL files may have interdependencies (bridges, wrappers, integrators) - Generator code changes can create version mismatches between files - Partial regeneration creates subtle incompatibilities that cause test failures - Even “small innocuous” generator changes can have cascading effects

#### **The Rule:**

```
# ✗ WRONG - Partial regeneration
./bridge_generator.py --masters 5 --slaves 3 --output ../rtl/
# Only regenerates bridge_axi4_flat_5x3.sv
# Other files (wrappers, integrators) now mismatched!

# ✓ CORRECT - Full regeneration
rm ../rtl/bridge_*.sv                                # Delete ALL generated
bridges
rm ../rtl/bridge_wrapper_*.sv                         # Delete ALL generated
wrappers
./regenerate_all_bridges.sh                            # Regenerate everything
together
```

**Generator Files That Trigger Full Regeneration:** - `bridge_generator.py` - Main bridge generator - `bridge_csv_generator.py` - CSV-based generator - `bridge_address_arbiter.py` - Address decode logic - `bridge_channel_router.py`

- Channel routing logic - `bridge_response_router.py` - Response routing logic -  
`bridge_amba_integrator.py` - AMBA component integration -  
`bridge_wrapper_generator.py` - Wrapper generation - **Any Python file** in  
projects/components/bridge/bin/

**Symptoms of Version Mismatch:** - Tests that previously passed now fail -  
Simulation errors about missing signals - Mismatched port widths or counts -  
Address decode routing to wrong slaves

**Think of Generated RTL Like Compiled Code:** When you update a compiler,  
you don't selectively recompile - you rebuild everything. When you update a  
generator, you don't selectively regenerate - you regenerate everything.

---

## 1. Product Overview

### 1.1 Purpose

Bridge provides automated generation of AXI4 crossbar infrastructure with:-  
**Python code generation** - Parameterized RTL generation (similar to APB/Delta) -  
**Performance modeling** - Analytical + simulation validation - **Flat topology** - Full  
M×N interconnect matrix - **ID-based routing** - Out-of-order transaction support -  
**Burst optimization** - Pipelined burst transfers

### 1.2 Target Audience

**Primary Users:** - RTL designers building SoC interconnect - System architects  
evaluating interconnect topologies - Verification engineers needing crossbar  
testbenches - Students learning AXI4 protocol and interconnects

**Educational Focus:** - Demonstrates AXI4 protocol complexity - Shows arbitration  
strategies for memory-mapped busses - Teaches ID-based transaction tracking -  
Illustrates burst optimization techniques

### 1.3 Success Criteria

**Functional:** - ✘ Generates working AXI4 crossbar RTL (`bridge_generator.py`) - ✘  
Generates CSV-configured bridges (`bridge_csv_generator.py`) - ✘ Passes Verilator  
lint - ✘ Supports 1-32 masters, 1-256 slaves - ✘ Handles out-of-order completion  
via IDs (`bridge_cam.sv`) - ✘ Supports burst lengths 1-256 beats - ✘ Channel-specific  
masters (wr/rd/rw) for resource optimization (Phase 2) - [ ] APB converter  
integration (Phase 3 pending)

**Performance:** - ✅ Latency  $\leq$  3 cycles for single-beat transactions - ✅ Throughput: All  $M \times N$  paths can transfer concurrently - ✅ Performance models implemented (bridge\_model.py - V1 Flat) - [ ] Fmax  $\geq$  300 MHz on UltraScale+ FPGAs (pending synthesis validation)

**Quality:** - ✅ Complete specifications (PRD) before code - ✅ Performance models validate requirements (bridge\_model.py) - ✅ All generated RTL Verilator verified - ✅ Integration examples provided (CSV examples) - ✅ Comprehensive documentation (BRIDGE\_CURRENT\_STATE.md, BRIDGE\_ARCHITECTURE\_DIAGRAMS.md)

## 1.4 Implementation Status and Phases

### Unified Generator (bridge\_generator.py):

The bridge generator now supports both TOML/CSV configuration and legacy array-indexed modes:

**Configuration Modes:** 1. **TOML/CSV Mode (Preferred)** - TOML port configuration (bridge\_name.toml) - CSV connectivity matrix (bridge\_name\_connectivity.csv) - Custom port prefixes (rapids\_m\_axi\_, cpu\_m\_axi\_, etc.) - Interface wrapper integration (timing isolation) - Mixed protocols (AXI4 + APB + AXI4-Lite slaves) - Channel-specific masters (wr/rd/rw) - Status: ✓ Phase 2 complete, Phase 3 pending

#### 2. Legacy CSV Mode (Backwards Compatible)

- Separate ports.csv and connectivity.csv files
- Migration path to TOML format
- See test\_configs/README.md for conversion guide

### Phase Status:

**Phase 1: CSV Configuration ✓ COMPLETE** - CSV parser (ports.csv, connectivity.csv) - Port generation with custom prefixes - Converter identification logic - Basic crossbar instantiation

**Phase 2: Channel-Specific Masters ✓ COMPLETE (2025-10-26)** - Added channels field to PortSpec (rw/wr/rd) - Conditional port generation based on channels - **Resource Optimization:** - wr (write-only): AW, W, B channels only → 39% port reduction - rd (read-only): AR, R channels only → 61% port reduction - rw (full): All 5 channels - Width converter awareness (only generate needed converters) - Example: 4-master bridge saves 35% signals with optimized channels

**Phase 3: APB Converter Integration**  **PENDING** - AXI2APB converter module (create or integrate existing) - APB signal packing/unpacking - APB converter instantiation in generated bridges - Width converter + APB converter chaining - End-to-end testing with APB slaves - Status: Placeholders in generated code with detailed TODO comments

**Additional Resources:** - `bridge_model.py` - Performance modeling (V1 Flat implemented) - `bridge_cam.sv` - Transaction ID tracking for OOO support - See `BRIDGE_CURRENT_STATE.md` for detailed review - See `docs/BRIDGE_ARCHITECTURE_DIAGRAMS.md` for visual architecture

---

## 2. Architecture Overview

### 2.1 AXI4 vs AXIS vs APB

| Feature                   | APB                  | AXI-Stream (Delta)       | AXI4 (Bridge)                |
|---------------------------|----------------------|--------------------------|------------------------------|
| <b>Protocol Type</b>      | Simpl e register bus | Streaming data           | Memory-mapped burst          |
| <b>Channels</b>           | 1 (address + data)   | 1 (data stream)          | 5 (AW, W, B, AR, R)          |
| <b>Request Generation</b> | Address range decode | TDEST decode             | Address range decode         |
| <b>Arbitration</b>        | Per-slave, per-cycle | Per-slave, packet-locked | Per-slave, per-address-phase |
| <b>Out-of-Order</b>       | No (sequential)      | No (streaming order)     | Yes (ID-based)               |
| <b>Burst Support</b>      | No                   | Packet (via TLAST)       | Yes (AWLEN/ARLEN)            |

| Feature           | APB               | AXI-Stream (Delta) | AXI4 (Bridge)     |
|-------------------|-------------------|--------------------|-------------------|
| <b>Complexity</b> | Low               | Medium             | <b>High</b>       |
| <b>Latency</b>    | 1-2 cycles        | 2 cycles           | 2-3 cycles        |
| <b>Use Case</b>   | Control registers | Data streaming     | Memory-mapped I/O |

**Bridge Complexity Sources:** 1. 5 independent channels requiring separate arbitration 2. ID-based routing for out-of-order completion 3. Burst handling with interleaving constraints 4. Write response tracking (match AW with B channel) 5. Address decode + ID muxing for response routing

## 2.2 Block Diagram





### 2.3 Key Components

- 1. Request Generation** - Address range decode for each slave - Generates  $M \times S$  request matrix per channel (AW, AR) - Similar to APB but more complex (2 address channels)
- 2. Per-Slave Arbitration - 5 separate arbiters per slave:** - AW channel arbiter (write address) - W channel arbiter (write data - locked to AW grant) - B channel arbiter (write response - routed by ID) - AR channel arbiter (read address) - R channel arbiter (read data - routed by ID) - Round-robin with burst locking - Separate read/write paths (no head-of-line blocking)
- 3. Data Multiplexing** - Mux master signals to selected slave - ID-based response routing (B, R channels) - Burst tracking (hold grant until xlast)
- 4. Transaction Tracking** - ID tables per slave for out-of-order support - Track {Master ID, Transaction ID} → Master mapping - Required for routing B/R channels back to correct master
- 5. Optional Performance Counters** - Transaction counts per master/slave - Arbitration conflict counts - Latency histograms

---

## 3. Functional Requirements

### 3.1 AXI4 Protocol Compliance

**FR-1: Full AXI4 Protocol Support** - Support all 5 AXI4 channels: AW, W, B, AR, R - Comply with AMBA AXI4 specification (ARM IHI 0022) - Support burst lengths: 1-256 beats (AWLEN/ARLEN = 0-255) - Support burst types: INCR, WRAP, FIXED - Support burst sizes: 1-128 bytes (AWSIZE/ARSIZE = 0-7)

**FR-2: Out-of-Order Transaction Support** - Route responses via ID matching - Maintain transaction ID integrity (AWID → BID, ARID → RID) - Support configurable ID width (1-16 bits) - Track up to  $2^{ID\_WIDTH}$  outstanding transactions per slave

**FR-3: Atomic Operations** - Support exclusive access (AWLOCK/ARLOCK) - Track exclusive monitor per slave - Generate BRESP/RRESP errors for failed exclusives

### 3.2 Address Decoding

**FR-4: Configurable Address Map** - Support  $M$  masters  $\times S$  slaves - Each slave has base address and size - Non-overlapping address ranges (verified at generation) - Default slave for unmapped addresses (optional)

### FR-5: Address Range Configuration

```
# Example address map
address_map = {
    0: {'base': 0x00000000, 'size': 0x40000000, 'name': 'DDR'},      #
    1GB
    1: {'base': 0x40000000, 'size': 0x00100000, 'name': 'SRAM'},      #
    1MB
    2: {'base': 0x50000000, 'size': 0x01000000, 'name': 'PCIE'},      #
    16MB
    3: {'base': 0x60000000, 'size': 0x00010000, 'name': 'Peripherals'}#
    # 64KB
}
```

### 3.3 Arbitration Strategy

**FR-6: Round-Robin Arbitration** - Fair bandwidth allocation (no starvation) - Separate arbiters for AW and AR channels - Burst locking: Grant held until xlast (WLAST/RLAST) - Configurable arbitration policy (round-robin default)

**FR-7: Read/Write Independence** - Separate read and write paths - No head-of-line blocking between read/write - Concurrent read and write to same slave (if slave supports)

### 3.4 Burst Handling

**FR-8: Burst Optimization** - Pipelined burst transfers (overlap address and data) - No artificial burst splitting - Full AXI4 burst protocol support

**FR-9: Interleaving Constraints** - W channel locked to AW grant master - R channel routed by transaction ID - Support ID-based interleaving (slave-dependent)

---

## 4. Non-Functional Requirements

### 4.1 Performance

**NFR-1: Latency - Single-beat read:**  $\leq 3$  cycles (address decode + arbitration + mux) - **Single-beat write:**  $\leq 3$  cycles (address decode + arbitration + mux) - **Burst transfer:** No additional latency per beat (pipelined)

**NFR-2: Throughput - Concurrent transfers:** All  $M \times S$  paths can transfer simultaneously - **Burst efficiency:** Line-rate data transfer after address phase - **No artificial stalls:** Crossbar adds no wait states beyond arbitration

**NFR-3: Fmax - Target:** 300-400 MHz on Xilinx UltraScale+ - **Registered outputs:** All slave outputs registered for timing closure - **Pipelineable:** Optional pipeline stages for  $>400$  MHz

### 4.2 Resource Usage (Estimated)

**M = 4 masters, S = 4 slaves, DATA\_WIDTH = 512, ADDR\_WIDTH = 64, ID\_WIDTH = 4:**

| Resource    | Flat Crossbar | Notes                               |
|-------------|---------------|-------------------------------------|
| <b>LUTs</b> | ~2,500        | Address decode + arbiters + mux     |
| <b>FFs</b>  | ~3,000        | Registered outputs + ID tables      |
| <b>BRAM</b> | 0             | Distributed RAM for small ID tables |

| Resource | Flat Crossbar | Notes                    |
|----------|---------------|--------------------------|
| DSP      | 0             | No arithmetic operations |

**Scaling:** ~150 LUTs per M×S connection

### 4.3 Quality Requirements

**NFR-4: Code Generation Quality** - Lint-clean SystemVerilog (Verilator) - Synthesizable (Vivado, Yosys, Design Compiler) - No vendor-specific primitives (technology-agnostic) - Clear structure (commented, readable)

**NFR-5: Verification** - CocoTB testbench framework - Transaction-level verification - Out-of-order test scenarios - Burst interleaving tests - >95% functional coverage

---

## 5. Interface Specifications

### 5.1 AXI4 Master Interfaces (M × 5 channels)

**Write Address Channel (AW):**

```
input logic [ADDR_WIDTH-1:0]      s_axi_awaddr [NUM_MASTERS];
input logic [ID_WIDTH-1:0]        s_axi_awid  [NUM_MASTERS];
input logic [7:0]                s_axi_awlen  [NUM_MASTERS]; // Burst
length-1
input logic [2:0]                s_axi_awsize [NUM_MASTERS]; // Burst
size
input logic [1:0]                s_axi_awburst [NUM_MASTERS]; //
INCR/WRAP/FIXED
input logic
Exclusive access
input logic [3:0]                s_axi_awcache [NUM_MASTERS]; // Cache
attributes
input logic [2:0]                s_axi_awprot [NUM_MASTERS]; //
Protection type
input logic                      s_axi_awvalid [NUM_MASTERS];
output logic                     s_axi_awready [NUM_MASTERS];
```

**Write Data Channel (W):**

```
input logic [DATA_WIDTH-1:0]    s_axi_wdata  [NUM_MASTERS];
input logic [DATA_WIDTH/8-1:0]   s_axi_wstrb  [NUM_MASTERS]; // Byte
strokes
input logic                      s_axi_wlast  [NUM_MASTERS]; // Last
beat
```

```

input logic                               s_axi_wvalid [NUM_MASTERS];
output logic                            s_axi_wready [NUM_MASTERS];

```

### Write Response Channel (B):

```

output logic [ID_WIDTH-1:0]      s_axi_bid     [NUM_MASTERS];
output logic [1:0]                s_axi_bresp   [NUM_MASTERS]; // 
OKAY/EXOKAY/SLVERR/DECERR
output logic
input logic                         s_axi_bvalid  [NUM_MASTERS];
                                         s_axi_bready [NUM_MASTERS];

```

### Read Address Channel (AR):

```

input logic [ADDR_WIDTH-1:0]    s_axi_araddr [NUM_MASTERS];
input logic [ID_WIDTH-1:0]      s_axi_arid   [NUM_MASTERS];
input logic [7:0]              s_axi_arlen   [NUM_MASTERS];
input logic [2:0]              s_axi_arsize  [NUM_MASTERS];
input logic [1:0]              s_axi_arburst [NUM_MASTERS];
input logic                   s_axi_arlock  [NUM_MASTERS];
input logic [3:0]              s_axi_arcache [NUM_MASTERS];
input logic [2:0]              s_axi_arprot  [NUM_MASTERS];
input logic                   s_axi_arvalid [NUM_MASTERS];
output logic                  s_axi_arready [NUM_MASTERS];

```

### Read Data Channel (R):

```

output logic [DATA_WIDTH-1:0]  s_axi_rdata  [NUM_MASTERS];
output logic [ID_WIDTH-1:0]    s_axi_rid    [NUM_MASTERS];
output logic [1:0]             s_axi_rrresp [NUM_MASTERS];
output logic                  s_axi_rlast  [NUM_MASTERS];
output logic                  s_axi_rvalid [NUM_MASTERS];
input logic                   s_axi_rready [NUM_MASTERS];

```

## 5.2 AXI4 Slave Interfaces ( $S \times 5$ channels)

Mirror of master interfaces, with M → S direction reversed.

### 5.3 Configuration Parameters

```

parameter int NUM_MASTERS = 4;           // Number of master
interfaces
parameter int NUM_SLAVES = 4;           // Number of slave
interfaces
parameter int DATA_WIDTH = 512;          // Data bus width (bits)
parameter int ADDR_WIDTH = 64;            // Address bus width (bits)
parameter int ID_WIDTH = 4;               // Transaction ID width
(bits)
parameter int MAX_BURST_LEN = 256;        // Maximum burst length
(beats)
parameter bit PIPELINE_OUTPUTS = 1;       // Register slave outputs
parameter bit ENABLE_COUNTERS = 1;         // Performance counters

```

---

## 6. Performance Modeling

### 6.1 Analytical Model

#### Latency Components (Flat Crossbar):

Single-Beat Read Latency:

1. Address Decode: 0 cycles (combinatorial)
2. Arbitration: 1 cycle (AR arbiter decision)
3. Address Mux: 0 cycles (combinatorial)
4. Slave Access: (slave-dependent, not crossbar)
5. Response Mux: 1 cycle (R channel ID lookup + mux)
6. Output Register: 1 cycle (optional, for timing closure)

Total (no pipeline): 2 cycles

Total (pipelined): 3 cycles

#### Burst Transfer Throughput:

After address phase completes, data transfer is line-rate:

- W channel: 1 beat/cycle (locked to AW grant)
- R channel: 1 beat/cycle (ID-routed from slave)

Example: 256-beat burst

- Address phase: 2-3 cycles (one-time)  
Data phase: 256 cycles (line-rate)  
Total: 258-259 cycles  
Efficiency: 98.8%

#### Concurrent Throughput:

Maximum concurrent transfers (4x4 crossbar):

- Read: 4 concurrent (one per master-slave pair)
- Write: 4 concurrent (one per master-slave pair)
- Total: 8 concurrent read+write (separate paths)

Throughput @ 100 MHz, 512-bit data:

- Per path: 100 MHz × 512 bits = 51.2 Gbps  
Total: 8 paths × 51.2 Gbps = 409.6 Gbps theoretical

### 6.2 Resource Scaling

#### LUT Usage Formula (empirical):

$$\text{LUTs} \approx 500 \text{ (base)} + 150 \times M \times S + 20 \times \text{ID\_TABLE\_DEPTH}$$

#### Example (4x4 crossbar, ID\_WIDTH=4, ID\_TABLE\_DEPTH=16 per slave):

$$\begin{aligned}
 \text{LUTs} &\approx 500 + 150 \times 16 + 20 \times 16 \times 4 \\
 &\approx 500 + 2,400 + 1,280 \\
 &\approx 4,180 \text{ LUTs}
 \end{aligned}$$

### 6.3 Comparison with Other Crossbars

| Crossbar Type                 | Latency    | Throughput          | Complexity  | Use Case          |
|-------------------------------|------------|---------------------|-------------|-------------------|
| <b>APB</b>                    | 1-2 cycles | Low<br>(serialized) | Low         | Control registers |
| <b>AXI-Stream<br/>(Delta)</b> | 2 cycles   | High<br>(streaming) | Medium      | Data streaming    |
| <b>AXI4 (Bridge)</b>          | 2-3 cycles | High (burst)        | <b>High</b> | Memory-mapped I/O |
| <b>AXI4 + Slices</b>          | 4-6 cycles | High (burst)        | Very High   | >400 MHz designs  |

**Bridge Sweet Spot:** High-performance memory-mapped interconnects where out-of-order and burst efficiency are critical.

---

## 7. Generator Architecture

### 7.1 Python Generator Structure

```

class BridgeGenerator:
    """AXI4 crossbar RTL generator"""

    def __init__(self, config):
        self.num_masters = config.num_masters
        self.num_slaves = config.num_slaves
        self.data_width = config.data_width
        self.addr_width = config.addr_width
        self.id_width = config.id_width
        self.address_map = config.address_map

    def generate_address_decode(self) -> str:
        """Generate address range decode logic"""
        # For each master x slave, check if address in range
        # More complex than AXIS (uses address), simpler than full
        decode

    def generate_aw_arbiter(self, slave_idx) -> str:
        """Generate write address channel arbiter for one slave"""
        # Round-robin arbiter
        # Grants locked until corresponding B response completes

```

```

def generate_ar_arbiter(self, slave_idx) -> str:
    """Generate read address channel arbiter for one slave"""
    # Round-robin arbiter
    # Grants locked until corresponding R response completes
(RLAST)

def generate_w_channel_mux(self, slave_idx) -> str:
    """Generate write data channel multiplexer"""
    # W channel follows AW grant (locked until WLAST)

def generate_b_channel_demux(self, slave_idx) -> str:
    """Generate write response channel demultiplexer"""
    # Route by ID: lookup master from {slave_idx, BID}

def generate_r_channel_demux(self, slave_idx) -> str:
    """Generate read data channel demultiplexer"""
    # Route by ID: lookup master from {slave_idx, RID}

def generate_id_table(self, slave_idx) -> str:
    """Generate transaction ID tracking table"""
    # Maps {slave, transaction_id} → master_id
    # Indexed on AW/AR grant, looked up on B/R response

def generate_crossbar(self) -> str:
    """Generate complete crossbar module"""
    # Instantiate all arbiters, muxes, demuxes, ID tables

```

## 7.2 Address Map Configuration

### Python Configuration:

```

bridge_config = {
    'num_masters': 4,
    'num_slaves': 4,
    'data_width': 512,
    'addr_width': 64,
    'id_width': 4,
    'address_map': {
        0: {'base': 0x00000000, 'size': 0x40000000, 'name': 'DDR'},
        1: {'base': 0x40000000, 'size': 0x00100000, 'name': 'SRAM'},
        2: {'base': 0x50000000, 'size': 0x01000000, 'name': 'PCIE'},
        3: {'base': 0x60000000, 'size': 0x00010000, 'name': 'PCIe'}
    },
    'Peripherals': []
},
    'pipeline_outputs': True,
    'enable_counters': True
}

```

## Generated Address Decode:

```
// Address decode for each master
always_comb begin
    for (int s = 0; s < NUM_SLAVES; s++)
        aw_request_matrix[s] = '0;

    for (int m = 0; m < NUM_MASTERS; m++) begin
        if (s_axi_awvalid[m]) begin
            // Slave 0: DDR (0x00000000 - 0x3FFFFFF)
            if (s_axi_awaddr[m] >= 64'h00000000 &&
                s_axi_awaddr[m] < 64'h40000000)
                aw_request_matrix[0][m] = 1'b1;

            // Slave 1: SRAM (0x40000000 - 0x40FFFFFF)
            if (s_axi_awaddr[m] >= 64'h40000000 &&
                s_axi_awaddr[m] < 64'h40100000)
                aw_request_matrix[1][m] = 1'b1;

            // ... slaves 2-3 ...
        end
    end
end
```

---

## 8. Comparison with APB and Delta

### 8.1 Code Reuse from APB Generator

**Similar Components (~70% reuse):** - Address range decode logic (same pattern, different signals) - Round-robin arbiter (same algorithm) - Data multiplexing pattern (same approach) - Backpressure handling (similar to PREADY)

**New Components for Bridge:** - 5× the arbiters (AW, W, B, AR, R instead of single channel) - **ID-based routing** (B and R channel demuxing) - **Transaction tracking** (ID tables for out-of-order) - **Burst handling** (grant locking until xlast)

**Migration Effort from APB:** - ~120 minutes (vs ~75 min for AXIS, due to higher complexity) - Most time: ID table logic and response demuxing

### 8.2 Code Reuse from Delta Generator

**Similar Components (~60% reuse):** - Python generation framework - Command-line interface - Performance modeling structure - Arbitration pattern (round-robin with locking)

**Key Differences:** - 5 channels vs 1 channel (Delta only has TDATA/TVALID/TREADY/TLAST) - **ID-based routing** vs TDEST-based routing - **Address decode** vs TDEST decode (Bridge more complex) - **Transaction tracking** vs packet atomicity (different mechanisms)

### 8.3 Complexity Comparison

| Metric                       | APB Crossbar   | Delta (AXIS)    | Bridge (AXI4)          |
|------------------------------|----------------|-----------------|------------------------|
| <b>Channels to arbitrate</b> | 1              | 1               | 5 ★                    |
| <b>Request generation</b>    | Address ranges | TDEST decode    | Address ranges         |
| <b>Response routing</b>      | Grant-based    | Grant-based     | ID-based ★             |
| <b>Burst support</b>         | No             | Packet (T LAST) | Yes (AWLEN/ARLEN)<br>★ |
| <b>Out-of-order</b>          | No             | No              | Yes (ID tables) ★      |
| <b>Transaction tracking</b>  | No             | No              | Yes ★                  |
| <b>Lines of Python</b>       | ~500           | ~697            | ~900 (est.)            |
| <b>Lines of generated SV</b> | ~200 (4×4)     | ~250 (4×4)      | ~400 (4×4) (est.)      |

★ = Additional complexity in Bridge

## 9. Use Cases

### 9.1 Multi-Core Processor Interconnect

**Scenario:** 4 CPU cores + GPU accessing DDR + SRAM + Peripherals

Configuration:

Masters: 5 (4 CPUs, 1 GPU)  
Slaves: 3 (DDR, SRAM, Peripherals)

Data: 512-bit (cache line width)  
Address: 64-bit (large memory space)  
ID: 4-bit (up to 16 outstanding per master)

**Benefits:** - Concurrent access to all slaves - Out-of-order completion for high-performance CPUs - Burst transfers for cache line fills - Separate read/write paths (no head-of-line blocking)

## 9.2 DMA + Accelerator System

**Scenario:** DMA engine + 4 accelerators accessing shared memory

Configuration:

Masters: 5 (1 DMA, 4 accelerators)  
Slaves: 2 (DDR, Control registers)  
Data: 512-bit (high-bandwidth DMA)  
Address: 32-bit (limited address space)  
ID: 2-bit (simple ID space)

**Benefits:** - High-bandwidth memory access for DMA - Fair arbitration prevents accelerator starvation - Control register access doesn't block data transfers

## 9.3 FPGA System Integration

**Scenario:** MicroBlaze CPU + custom accelerators + memory controllers

Configuration:

Masters: 8 (1 CPU, 7 accelerators)  
Slaves: 4 (DDR, BRAM, AXI GPIO, AXI DMA)  
Data: 128-bit (AXI4 standard width)  
Address: 32-bit (standard FPGA address space)  
ID: 4-bit (moderate outstanding transactions)

**Benefits:** - Standard AXI4 interfaces (Xilinx IP compatibility) - Scalable to many masters/slaves - Performance counters for profiling

---

# 10. Testing Strategy

## 10.1 FUB (Functional Unit Block) Tests

**Address Decode Tests:** - Verify all address ranges correctly decoded - Test boundary conditions (base, base+size-1) - Test unmapped addresses (error response)

**Arbiter Tests:** - Round-robin fairness (all masters get turns) - Burst locking (grant held until xlast) - Starvation prevention

**ID Table Tests:** - Correct ID → master mapping - Out-of-order transaction handling - Table full condition

**Mux/Demux Tests:** - Data integrity through crossbar - Response routing to correct master - Concurrent transfers don't interfere

## 10.2 Integration Tests

**Single-Master, Single-Slave:** - Basic read/write transactions - Burst transfers (various lengths) - Out-of-order completions

**Multi-Master, Single-Slave:** - Arbitration correctness - Fairness verification - Burst interleaving (if supported)

**Multi-Master, Multi-Slave:** - Concurrent transfers - No crosstalk between paths - Full M×S matrix coverage

**Stress Tests:** - All masters active simultaneously - Maximum burst lengths - Full ID space utilization - Back-to-back bursts

## 10.3 Performance Validation

**Latency Measurement:** - Single-beat read: measure actual vs analytical - Single-beat write: measure actual vs analytical - Compare with specification ( $\leq 3$  cycles)

**Throughput Measurement:** - Burst transfer efficiency (should be ~98%) - Concurrent path throughput (should be line-rate  $\times$  M×S) - Compare with theoretical maximum

**Resource Validation:** - Synthesize for Xilinx UltraScale+ - Compare LUT/FF usage with estimates - Verify Fmax  $\geq$  300 MHz

---

# 11. Documentation Plan

## 11.1 Specifications (Before Code)

- **PRD.md** - This document (complete requirements)
- **README.md** - User guide with quick start
- **BRIDGE\_VS\_APB\_GENERATOR.md** - Migration guide from APB
- **BRIDGE\_VS\_DELTA\_GENERATOR.md** - Comparison with Delta
- **AXI4\_PROTOCOL\_GUIDE.md** - AXI4 primer for students

## 11.2 Performance Analysis (Before Implementation)

- **bin/bridge\_performance\_model.py** - Analytical model
  - Latency calculations (single-beat, burst)
  - Throughput estimates (concurrent paths)
  - Resource estimates (LUT/FF scaling)
- **bin/bridge\_simulator.py** - Discrete event simulation (optional)
  - Cycle-accurate modeling
  - Traffic pattern support
  - Validation against RTL

## 11.3 Code Generation

- **bin/bridge\_generator.py** - Main RTL generator
  - Command-line interface
  - Address map configuration
  - Generated RTL output

## 11.4 Verification

- **dv/tests/** - CocoTB testbenches
    - FUB tests (individual blocks)
    - Integration tests (multi-block)
    - Stress tests (corner cases)
- 

## 12. Success Metrics

### 12.1 Functional Completeness

- Generates working AXI4 crossbar RTL
- Passes all CocoTB tests
- Verilator lint clean
- Xilinx Vivado synthesis clean

### 12.2 Performance Targets

- Latency  $\leq$  3 cycles (measured in simulation)
- Throughput = line-rate  $\times$  M $\times$ S paths (measured)
- Fmax  $\geq$  300 MHz (post-synthesis)

### 12.3 Educational Value

- Complete specifications demonstrating rigor

- Performance models validate requirements
- Clear code structure (readable generated RTL)
- Integration examples provided

#### 12.4 Reusability

- ~70% code reuse from APB generator (measured by LOC)
  - Similar patterns to Delta generator
  - Can be extended (weighted arbitration, QoS, etc.)
- 

## 12. Attribution and Contribution Guidelines

### 12.1 Git Commit Attribution

When creating git commits for Bridge documentation or implementation:

**Use:**

Documentation and implementation support by Claude.

**Do NOT use:**

Co-Authored-By: Claude <noreply@anthropic.com>

**Rationale:** Bridge documentation and organization receives AI assistance for structure and clarity, while design concepts and architectural decisions remain human-authored.

---

### 12.2 PDF Generation Location

**IMPORTANT: PDF files should be generated in the docs directory:**

/mnt/data/github/rtldesignsherpa/projects/components/bridge/docs/

**Quick Command:** Use the provided shell script:

```
cd /mnt/data/github/rtldesignsherpa/projects/components/bridge/docs  
./generate_pdf.sh
```

The shell script will automatically: 1. Use the md\_to\_docx.py tool from bin/ 2. Process the bridge\_spec index file 3. Generate both DOCX and PDF files in the docs/ directory 4. Create table of contents and title page

 **See:** bin/md\_to\_docx.py for complete implementation details

---

## 13. Future Enhancements

### 13.1 Short-Term (Post-Initial Release)

- **Optional pipeline stages** - For Fmax >400 MHz
- **Weighted arbitration** - QoS support
- **Default slave** - Unmapped address handling
- **Exclusive monitor** - Full atomic operation support

### 13.2 Long-Term

- **Tree topology** - Hierarchical crossbar (like Delta)
  - **AXI4-Lite variant** - Simplified for control registers
  - **ACE support** - Coherent cache interconnect
  - **GUI configurator** - Visual address map setup
- 

## 14. Risk Assessment

| Risk                             | Probability | Impact | Mitigation                                       |
|----------------------------------|-------------|--------|--------------------------------------------------|
| <b>ID table complexity</b>       | Medium      | High   | Start with small ID_WIDTH (2-4), test thoroughly |
| <b>Out-of-order corner cases</b> | High        | High   | Extensive CocoTB tests with random delays        |
| <b>Fmax below target</b>         | Low         | Medium | Optional pipeline stages for timing closure      |
| <b>Resource usage exceeds</b>    | Low         | Low    | Empirical formulas guide expectations            |
| <b>Burst interleaving bugs</b>   | Medium      | High   | Separate test suite for burst scenarios          |

## 15. Project Timeline (Estimated)

**Week 1: Specifications and Models** - [x](#) Day 1-2: PRD.md (complete) - [ ] Day 3-4: Performance modeling (analytical) - [ ] Day 5-7: README.md, migration guides

**Week 2-3: Core Implementation** - [ ] Day 1-3: Address decode + arbiter generation - [ ] Day 4-5: Data mux/demux generation - [ ] Day 6-8: ID table generation - [ ] Day 9-10: Integration and testing

**Week 4: Verification and Examples** - [ ] Day 1-5: CocoTB testbenches - [ ] Day 6-7: Integration examples

---

## 16. References

**AXI4 Specifications:** - ARM IHI 0022 - AMBA AXI and ACE Protocol Specification - Xilinx UG1037 - Vivado AXI Reference Guide

**Related Projects:** - **APB Crossbar** - Simple register bus crossbar (existing) - **Delta (AXIS Crossbar)** - Streaming data crossbar (projects/components/delta/) - **RAPIDS** - DMA engine with AXI4 masters (rtl/miop)

**Tools:** - Verilator - RTL linting and simulation - CocoTB - Python-based verification - Xilinx Vivado - FPGA synthesis

---

## 17. Glossary

- **AXI4:** Advanced eXtensible Interface version 4 (AMBA standard)
  - **Burst:** Multi-beat transaction ( $\text{AWLEN}/\text{ARLEN} > 0$ )
  - **Crossbar:** Full  $M \times N$  interconnect matrix
  - **ID:** Transaction identifier for out-of-order support
  - **Out-of-order:** Responses can return in different order than requests
  - **xlast:** WLAST (write) or RLAST (read) - last beat indicator
- 

**Version:** 1.0 **Status:** ✓ Specification Complete - Ready for Implementation **Next Steps:** Create performance models, then implement generator

**Project Bridge - Connecting masters and slaves across the divide** 

# Claude Code Guide: Bridge Subsystem

**Version:** 2.1 **Last Updated:** 2025-11-03 **Purpose:** AI-specific guidance for working with Bridge subsystem

---



## CRITICAL: Read Architecture Document First

**Before making ANY changes to bridge generator or understanding signal flow:**

**READ:** [projects/components/bridge/docs/BRIDGE\\_ARCHITECTURE.md](projects/components/bridge/docs/BRIDGE_ARCHITECTURE.md)

This document contains the **definitive bridge architecture** including: - Correct signal flow (wrappers → decoder → converters → crossbar → slaves) - Component purposes and placement - Common misconceptions that previous agents made - Why there's NO fixed crossbar width

**If you skip this document, you WILL make incorrect assumptions.**

---

## Quick Context

**What:** Bridge - Two complementary AXI4 Crossbar Generators (framework-based and CSV-based) **Status:** Phase 2 Complete - CSV generator with channel-specific masters (wr/rd/rw) **Your Role:** Help users configure CSV files, generate bridges, understand architecture, and create tests

**Complete Documentation (Read in This Order):** 1.

[projects/components/bridge/docs/BRIDGE\\_ARCHITECTURE.md](projects/components/bridge/docs/BRIDGE_ARCHITECTURE.md) ← **START HERE**  
(architecture reference) 2. <projects/components/bridge/PRD.md> ← Product requirements 3. [projects/components/bridge/BRIDGE\\_CURRENT\\_STATE.md](projects/components/bridge/BRIDGE_CURRENT_STATE.md) ← Current implementation review 4.

[projects/components/bridge/docs/BRIDGE\\_ARCHITECTURE\\_DIAGRAMS.md](projects/components/bridge/docs/BRIDGE_ARCHITECTURE_DIAGRAMS.md) ← Visual architecture diagrams 5.

[projects/components/bridge/CSV\\_BRIDGE\\_STATUS.md](projects/components/bridge/CSV_BRIDGE_STATUS.md) ← CSV generator status (Phase 1 & 2) 6.

[projects/components/bridge/docs/bridge\\_spec/bridge\\_index.md](projects/components/bridge/docs/bridge_spec/bridge_index.md) ← Detailed specification

---

## Target Architecture: Intelligent Width-Aware Routing

**Core Principle:** Direct connections where possible, converters only where needed, no fixed crossbar width.

### Efficient Multi-Width Design

Master\_A (64b)

|                                 |                                  |
|---------------------------------|----------------------------------|
| └ Direct → Slave_0 (64b)        | [0 conversions, minimal latency] |
| └ Conv(64→128) → Slave_1 (128b) | [1 conversion]                   |
| └ Conv(64→512) → Slave_2 (512b) | [1 conversion]                   |

Master\_B (512b)

|                                  |                                 |
|----------------------------------|---------------------------------|
| └ Conv(512→64) → Slave_0 (64b)   | [1 conversion]                  |
| └ Conv(512→128) → Slave_1 (128b) | [1 conversion]                  |
| └ Direct → Slave_2 (512b)        | [0 conversions, full bandwidth] |

**Router Logic:** Address decoder determines target slave, selects correctly-sized path for that master-slave pair.

### Why This Architecture

#### ✗ Naive Fixed-Width Approach (Don't Do This):

Master (64b) → Upsize(64→256) → Fixed 256b Crossbar → Downsize(256→64) → Slave (64b)

- TWO conversions for same-width connections (wasteful!)
- Reduced bandwidth on narrow paths
- Unnecessary logic and area
- Higher latency

#### ✓ Intelligent Routing (Target):

Master (64b) → Router → Direct Connection → Slave (64b)

- ZERO conversions for matching widths
- Full native bandwidth
- Minimal logic
- Lowest latency

### Per-Master Output Paths

Each master has N output paths (one per unique slave width it connects to):

```
// Master_A connects to slaves at 64b, 128b, 512b  
// Generate 3 output paths:
```

```

logic [63:0] master_a_64b_wdata; // For 64b slaves (direct)
logic [127:0] master_a_128b_wdata; // For 128b slaves (via converter)
logic [511:0] master_a_512b_wdata; // For 512b slaves (via converter)

// Router selects based on address decode:
always_comb begin
    case (decoded_slave_id)
        SLAVE_0: select master_a_64b_wdata; // Slave_0 is 64b
        SLAVE_1: select master_a_128b_wdata; // Slave_1 is 128b
        SLAVE_2: select master_a_512b_wdata; // Slave_2 is 512b
    endcase
end

```

## Benefits

1. **Resource Efficient** - Only instantiate converters actually needed
2. **Maximum Performance** - Direct paths have zero conversion overhead
3. **Optimal Bandwidth** - No artificial width bottlenecks
4. **Lower Latency** - Minimal logic in critical path for matching widths
5. **Scalable** - Works for any combination of master/slave widths

## Implementation Status

**Current State:** Fixed-width crossbar with master-side upsizing (Phase 1 architecture)

**Target State:** Intelligent per-master multi-width routing (your vision) **Migration:** Requires generator architecture rework (see TASKS.md)

---



## CRITICAL RULE #0: RTL Regeneration Requirements



**READ THIS FIRST - FAILURE TO FOLLOW CAUSES TEST FAILURES**

### The Golden Rule

**ALL generated RTL files MUST be deleted and regenerated together whenever ANY generator code changes.**

### Why This Is Non-Negotiable

Generated RTL files have interdependencies: - bridge\_axi4\_flat\_\*.sv may be instantiated by bridge\_ooo\_with\_arbiter.sv - bridge\_wrapper\_\*.sv may wrap bridge\_axi4\_flat\_\*.sv - Generator changes affect signal names, port widths, interface structure - **Partial regeneration creates version mismatches that cause silent failures**

## Real Example (This Session)

# ✗ WHAT WENT WRONG:

1. Updated bridge\_address\_arbiter.py (address decode logic)
2. Regenerated ONLY bridge\_axi4\_flat\_5x3.sv
3. Did NOT regenerate bridge\_ooo\_with\_arbiter.sv (wrapper)
4. Result: All tests that were passing now FAIL
5. Cause: Version mismatch between wrapper and core bridge

# ✓ WHAT SHOULD HAVE BEEN DONE:

1. Updated bridge\_address\_arbiter.py
2. Delete ALL generated files:  
rm rtl/bridge\_axi4\_flat\_\*.sv  
rm rtl/bridge\_ooo\_\*.sv  
rm rtl/bridge\_wrapper\_\*.sv
3. Regenerate ALL bridges from scratch
4. Run ALL tests to verify

## Generator Files That Trigger Full Regeneration

Any change to these files requires regenerating ALL bridges:

- ✓ bridge\_generator.py - Core bridge generator
- ✓ bridge\_csv\_generator.py - CSV-based generator
- ✓ bridge\_address\_arbiter.py - Address decode logic
- ✓ bridge\_channel\_router.py - Channel routing
- ✓ bridge\_response\_router.py - Response routing
- ✓ bridge\_amba\_integrator.py - AMBA integration
- ✓ bridge\_wrapper\_generator.py - Wrapper generation
- ✓ ANY Python file in projects/components/bridge/bin/

## The Regeneration Workflow

# Step 1: Make generator code changes

vim bridge\_address\_arbiter.py

# Step 2: Delete ALL generated RTL (be aggressive!)

```
cd projects/components/bridge/rtl
rm bridge_axi4_flat_*.sv
rm bridge_ooo*.sv
rm bridge_wrapper_*.sv
# Verify deletion
ls *.sv # Should only show manually-written files like bridge_cam.sv
```

# Step 3: Regenerate everything

```
cd ../bin
./regenerate_all_bridges.sh # If script exists
# OR manually regenerate each topology needed
```

```
# Step 4: Run ALL tests
cd ./dv/tests
pytest -v # ALL tests, not just the one you think changed

# Step 5: Verify git diff makes sense
git diff ..rtl/ # Review all changes
```

## Symptoms of Version Mismatch

If you see these symptoms, you probably did partial regeneration:

- ✗ Tests that previously passed now fail
  - ✗ “Signal not found” errors in simulation
  - ✗ Port width mismatches in instantiation
  - ✗ Address decode routing to wrong slaves
  - ✗ Missing debug signals (dbg\_\*)
  - ✗ Unexpected compile errors in working code

# Think Like a Compiler Developer

### **Generated RTL = Compiled Object Files**

When you update a compiler, you don't selectively recompile - you make clean & make all.

When you update a generator, you don't selectively regenerate - you **delete all** and **regenerate all**.

## Exception: Hand-Written RTL

These files are **never** regenerated:

- `bridge_cam.sv` - CAM module (hand-written)
- Any file in `rtl/` that is NOT generated

Check file headers - generated files say “Generated by: bridge\_generator.py”



## **MANDATORY: Project Organization Pattern**

**THIS SUBSYSTEM MUST FOLLOW THE RAPIDS/AMBA ORGANIZATIONAL PATTERN - NO EXCEPTIONS**

## Required Directory Structure

Required DW entry structure  
projects/components/bridge/

```

(generators, scripts)
    └── bridge_generator.py      # AXI4 crossbar generator
docs/
dv/
structure)
    ├── tbclasses/             # Testbench classes (MANDATORY - TB
    classes here!)
        ├── __init__.py
        └── bridge_axi4_flat_tb.py # Reusable TB class
    └── components/            # Bridge-specific BFM
        └── __init__.py
    └── scoreboards/          # Bridge-specific scoreboards (if
needed)
        └── __init__.py
    tests/                   # All test files (test runners
only)
        ├── conftest.py
        ├── fub_tests/
            └── basic/
        ├── integration_tests/
        └── system_tests/
    └── rtl/                  # RTL source files
        └── generated/
    └── CLAUDE.md             # This file
    └── PRD.md                # Product requirements
    └── IMPLEMENTATION_STATUS.md # Development status

```

## Testbench Class Location (MANDATORY)

**✗ WRONG:** Testbench class in test file

```
# projects/components/bridge/dv/tests/test_bridge_axi4_2x2.py
class BridgeAXI4FlatTB: # ✗ WRONG LOCATION!
    """Embedded TB - NOT REUSABLE"""

```

**✓ CORRECT:** Testbench class in PROJECT AREA dv/tbclasses/

```
# projects/components/bridge/dv/tbclasses/bridge_axi4_flat_tb.py
class BridgeAXI4FlatTB(TBBase): # ✓ CORRECT LOCATION!
    """Reusable TB class - used across all bridge tests"""

```

**CRITICAL:** TB classes are PROJECT-SPECIFIC and MUST be in the project area (projects/components/{name}/dv/tbclasses/), NOT in the framework (bin/CocoTBFramework/).

## Test File Pattern (MANDATORY)

Test files MUST follow this structure:

```

#
# projects/components/bridge/dv/tests/fub_tests/basic/test_bridge_axi4_2
x2.py

import os
import pytest
import cocotb
from cocotb_test.simulator import run

# ✓ IMPORT testbench class from PROJECT AREA
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__),
'../../../../'))
from projects.components.bridge.dv.tbclasses.bridge_axi4_flat_tb
import BridgeAXI4FlatTB
from CocoTBFramework.tbclasses.shared.utilities import get_paths,
create_view_cmd
from CocoTBFramework.tbclasses.shared.tbbase import TBBBase

#
=====
=====

# COCOTB TEST FUNCTIONS - Prefix with "cocotb_" to prevent pytest
collection
#
=====

@cocotb.test(timeout_time=100, timeout_unit='us')
async def cocotb_test_basic_routing(dut):
    """Test basic address routing"""
    tb = BridgeAXI4FlatTB(dut, num_masters=2, num_slaves=2) # ✓
Import TB
    await tb.setup_clocks_and_reset()
    result = await tb.test_basic_routing()
    assert result, "Basic routing test failed"

# More cocotb test functions...

#
=====
=====

# PARAMETER GENERATION - At bottom of file
#
=====

def generate_bridge_test_params():

```

```

"""Generate test parameters for bridge tests"""
return [
    # (num_masters, num_slaves, data_width, addr_width, id_width)
    (2, 2, 32, 32, 4),
    (4, 4, 32, 32, 4),
]
bridge_params = generate_bridge_test_params()

#
=====
=====  

# PYTEST WRAPPER FUNCTIONS - At bottom of file
#
=====  

=====

@pytest.mark.bridge
@pytest.mark.routing
@pytest.mark.parametrize("num_masters, num_slaves, data_width,
addr_width, id_width", bridge_params)
def test_basic_routing(request, num_masters, num_slaves, data_width,
addr_width, id_width):
    """Pytest wrapper for basic routing test"""
    module, repo_root, tests_dir, log_dir, rtl_dict = get_paths({
        'rtl_bridge': '../..//rtl'
    })

    # ... setup verilog_sources, parameters, etc ...

    run(
        verilog_sources=verilog_sources,
        toplevel=f"bridge_axi4_flat_{num_masters}x{num_slaves}",
        module=module,
        testcase="cocotb_test_basic_routing", # ← cocotb function
name
        parameters=rtl_parameters,
        sim_build=sim_build,
        # ...
    )

```

---

## Critical Rules for This Subsystem

### Rule #0: Attribution Format for Git Commits

**IMPORTANT:** When creating git commit messages for Bridge documentation or code:

**Use:**

Documentation and implementation support by Claude.

**Do NOT use:**

Co-Authored-By: Claude <noreply@anthropic.com>

**Rationale:** Bridge receives AI assistance for structure and clarity, while design concepts and architectural decisions remain human-authored.

### Rule #0.1: Testbench Architecture - MANDATORY SEPARATION

⚠ THIS IS A HARD REQUIREMENT - NO EXCEPTIONS ⚠

**NEVER embed testbench classes inside test runner files!**

The same testbench logic will be reused across multiple test scenarios. Having testbench code only in test files makes it COMPLETELY WORTHLESS for reuse.

**MANDATORY Structure:**

```
projects/components/bridge/
  └── dv/
    └── tbclasses/                      # TB classes HERE (not in
      └── __init__.py                   framework!)
      └── bridge_axi4_flat_tb.py     ← REUSABLE TB CLASS
      └── components/                 # Bridge-specific BFM (if
        └── __init__.py
        └── scoreboards/             # Bridge-specific scoreboards
          └── __init__.py
        └── tests/                  # Test runners
          └── conftest.py            ← MANDATORY pytest config
          └── fub_tests/
            └── basic/
              └── test_bridge_axi4_2x2.py ← TEST RUNNER ONLY
      (imports TB)
        └── integration_tests/
          └── test_bridge_multiport.py   ← TEST RUNNER ONLY
        └── system_tests/
          └── test_bridge_system.py     ← TEST RUNNER ONLY
```

## Why This Matters:

1. **Reusability:** Same TB class used in:
    - Unit tests (fub\_tests/)
    - Integration tests (integration\_tests/)
    - System-level tests (system\_tests/)
    - User projects (external imports)
  2. **Maintainability:** Fix bug once in TB class, all tests benefit
  3. **Composition:** TB classes can inherit/compose for complex scenarios
- 

## Rule #1: All Testbenches Inherit from TBBBase

**Every testbench class MUST inherit from TBBBase:**

```
from CocoTBFramework.tbclasses.shared.tbbbase import TBBBase

class BridgeAXI4FlatTB(TBBBase):
    """Testbench for Bridge crossbar - inherits base functionality"""

    def __init__(self, dut, num_masters=2, num_slaves=2, **kwargs):
        super().__init__(dut)
        # Bridge-specific initialization
```

**TBBBase Provides:** - Clock management (start\_clock, wait\_clocks) - Reset utilities (assert\_reset, deassert\_reset) - Logging (self.log) - Progress tracking (mark\_progress) - Safety monitoring (timeouts, memory limits)

---

## Rule #2: Mandatory Testbench Methods

**Every testbench class MUST implement these three methods:**

```
async def setup_clocks_and_reset(self):
    """Complete initialization - starts clocks and performs reset"""
    await self.start_clock('aclk', freq=10, units='ns')

    # Set config signals before reset (if needed)
    # self.dut.cfg_param.value = initial_value

    # Reset sequence
    await self.assert_reset()
    await self.wait_clocks('aclk', 10)
```

```

await self.deassert_reset()
await self.wait_clocks('aclk', 5)

async def assert_reset(self):
    """Assert reset signal (active-low for AXI4)"""
    self.dut.aresetn.value = 0

async def deassert_reset(self):
    """Deassert reset signal"""
    self.dut.aresetn.value = 1

```

**Why Required:** - Consistency across all testbenches - Reusability for mid-test resets - Clear test structure and intent

---

### Rule #3: Use GAXI Components for Protocol Handling

**For Bridge testing, use GAXI Master/Slave components for AXI4 channel handling:**

```

from CocoTBFramework.components.gaxi.gaxi_master import GAXIMaster
from CocoTBFramework.components.gaxi.gaxi_slave import GAXISlave
from CocoTBFramework.components.axi4.axi4_field_configs import
AXI4FieldConfigHelper

# ✓ CORRECT: Use GAXI for AXI4 channels
self.aw_master = GAXIMaster(
    dut=dut,
    title="Aw_M0",
    prefix="s0_axi4_",
    clock=clock,

field_config=AXI4FieldConfigHelper.create_aw_field_config(id_width,
addr_width, 1),
    pkt_prefix="aw",
    multi_sig=True,
    log=log
)

```

**Never manually drive AXI4 valid/ready handshakes** - Use GAXI components.

---

### Rule #4: Queue-Based Verification

**For simple in-order verification, use direct monitor queue access:**

```

# ✓ CORRECT: Direct queue access
aw_pkt = self.aw_slave._recvQ.popleft()
w_pkt = self.w_slave._recvQ.popleft()

# Verify
assert aw_pkt.addr == expected_addr
assert w_pkt.data == expected_data

# ✗ WRONG: Memory model for simple test
memory_model = MemoryModel() # Unnecessary complexity

```

**When to Use Memory Models:** - ✗ Simple in-order tests → Use queue access - ✗ Single-master systems → Use queue access - ✓ Complex out-of-order scenarios → Memory model may help - ✓ Multi-master with address overlap → Memory model tracks state

---

## TOML/CSV-Based Bridge Generator (Phase 2 Complete)

### Overview

The Bridge generator creates parameterized SystemVerilog crossbars from TOML configuration files with CSV connectivity matrices, eliminating manual RTL editing for complex interconnects.

**Key Benefits:** - **Human-readable configuration** - TOML for ports, CSV for connectivity - **Custom signal prefixes** - Each port has unique prefix (rapids\_m\_axi\_, apb0\_, etc.) - **Channel-specific masters** - Write-only (wr), read-only (rd), or full (rw) - **Interface modules** - Timing isolation via axi4\_master/slave wrappers with configurable skid depths - **Automatic converters** - Width and protocol conversion inserted automatically - **Resource efficient** - Only generates needed channels and converters

### Quick Start

#### 1. Create bridge\_mybridge.toml:

```

[bridge]
name = "bridge_mybridge"
description = "Custom bridge example"

# Default skid buffer depths (can be overridden per port)
defaults.skid_depths = {ar = 2, r = 4, aw = 2, w = 4, b = 2}

masters = [

```

```

    {name = "cpu", prefix = "cpu_m_axi", id_width = 4, addr_width =
32, data_width = 64, user_width = 1,
    interface = {type = "axi4_master"}},
    {name = "dma", prefix = "dma_m_axi", id_width = 4, addr_width =
32, data_width = 512, user_width = 1,
    interface = {type = "axi4_master", skid_depths = {ar = 4, r = 8,
aw = 4, w = 8, b = 4}}}
]

slaves = [
    {name = "ddr", prefix = "ddr_s_axi", id_width = 4, addr_width =
32, data_width = 512, user_width = 1,
    base_addr = 0x00000000, addr_range = 0x80000000, interface =
{type = "axi4_slave"}},
    {name = "sram", prefix = "sram_s_axi", id_width = 4, addr_width =
32, data_width = 256, user_width = 1,
    base_addr = 0x80000000, addr_range = 0x80000000, interface =
{type = "axi4_slave"}}
]

```

## 2. Create bridge\_mybridge\_connectivity.csv:

```
master\slave,ddr,sram
cpu,1,1
dma,1,1
```

## 3. Generate bridge:

```
cd projects/components/bridge/bin
python3 bridge_generator.py --ports test_configs/bridge_mybridge.toml
# Auto-finds bridge_mybridge_connectivity.csv

# Or use bulk generation
python3 bridge_generator.py --bulk bridge_batch.csv
```

**Result:** Complete SystemVerilog module with: - Custom port prefixes per port - Timing isolation via interface wrappers - Only needed AXI4 channels (wr/rd/rw optimized) - Width converters for data mismatches - Internal crossbar instantiation - APB/AXI4-Lite converter integration points

## Configuration Format Details

### TOML Port Configuration:

The primary format is now **TOML** (preferred over CSV for better structure and interface configuration):

**Port Specifications:** - name - Unique identifier (cpu, dma, ddr, etc.) - prefix - Signal prefix (cpu\_m\_axi\_, ddr\_s\_axi\_, etc.) - id\_width - AXI4 ID width in bits -

addr\_width - Address width in bits - data\_width - Data width in bits (32, 64, 128, 256, 512) - user\_width - AXI4 user signal width - base\_addr - Slave base address (slaves only) - addr\_range - Slave address range (slaves only) - interface - Interface wrapper configuration (optional) - type - “axi4\_master”, “axi4\_slave”, “axi4\_master\_mon”, “axi4\_slave\_mon”, or omit for direct connection - skid\_depths - Per-channel buffer depths: {ar, r, aw, w, b} (valid: 2, 4, 6, 8)

### CSV Connectivity Matrix:

```
master\slave,slave0,slave1,slave2
master0,1,0,1
master1,0,1,1
```

- 1 = connected, 0 = not connected
- Partial connectivity supported (not all masters to all slaves)
- Auto-detected based on TOML filename: `bridge_name.toml` → `bridge_name_connectivity.csv`

### Legacy CSV Format:

For backwards compatibility, the generator still supports CSV port files: - See `test_configs/README.md` for migration guide from CSV to TOML - TOML is now preferred for new bridges (better structure, interface config support)

## Channel-Specific Masters (Phase 2 Feature)

**Why Channel-Specific?** Real hardware often has dedicated read or write masters. Generating all 5 AXI4 channels wastes resources:

### Traditional (wasteful):

```
// Write-only master gets unused read channels


```

### Channel-Specific (optimized):

```
rapids_descr_wr,master,axi4,wr,rapids_descr_m_axi_,512,64,8,N/A,N/A
```

### Generated:

```
// Write-only master - only AW, W, B channels


```

```

output logic [7:0] rapids_descr_m_axi_bid,
// ✓ NO READ CHANNELS (araddr, rdata, etc.)

```

**Resource Savings:** - 40-60% fewer ports for dedicated masters - Only necessary width converters instantiated - Channel-aware direct connection wiring - Faster synthesis, smaller netlists

### Example: RAPIDS-Style Configuration

**RAPIDS Architecture:** - Descriptor write master (wr) - Writes descriptors to memory - Sink write master (wr) - Writes incoming packets to memory - Source read master (rd) - Reads outgoing packets from memory - CPU master (rw) - Full access for configuration

### TOML Configuration (bridge\_rapids.toml):

```

[bridge]
  name = "bridge_rapids"
  description = "RAPIDS accelerator bridge"
  defaults.skid_depths = {ar = 2, r = 4, aw = 2, w = 4, b = 2}

  masters = [
    {name = "rapids_descr_wr", prefix = "rapids_descr_m_axi", channels =
 = "wr",
      id_width = 8, addr_width = 64, data_width = 512, user_width = 1,
      interface = {type = "axi4_master"}},
    {name = "rapids_sink_wr", prefix = "rapids_sink_m_axi", channels =
 = "wr",
      id_width = 8, addr_width = 64, data_width = 512, user_width = 1,
      interface = {type = "axi4_master"}},
    {name = "rapids_src_rd", prefix = "rapids_src_m_axi", channels =
 = "rd",
      id_width = 8, addr_width = 64, data_width = 512, user_width = 1,
      interface = {type = "axi4_master"}},
    {name = "cpu", prefix = "cpu_m_axi", channels = "rw",
      id_width = 4, addr_width = 32, data_width = 64, user_width = 1,
      interface = {type = "axi4_master"}}
  ]

  slaves = [
    {name = "ddr", prefix = "ddr_s_axi", protocol = "axi4",
      id_width = 8, addr_width = 64, data_width = 512, user_width = 1,
      base_addr = 0x80000000, addr_range = 0x80000000,
      interface = {type = "axi4_slave"}},
    {name = "apb_periph", prefix = "apb0_", protocol = "apb",
      addr_width = 32, data_width = 32,
      base_addr = 0x00000000, addr_range = 0x00010000}
  ]

```

## Connectivity CSV (`bridge_rapids_connectivity.csv`):

```
master\slave,ddr,apb_periph  
rapids_descr_wr,1,0  
rapids_sink_wr,1,0  
rapids_src_rd,1,0  
cpu,1,1
```

**Generated RTL Features:** - rapids\_descr\_wr: 37 signals (write channels only) vs 61 signals (full) = **39% reduction** - rapids\_sink\_wr: 37 signals (write channels only) vs 61 signals (full) = **39% reduction** - rapids\_src\_rd: 24 signals (read channels only) vs 61 signals (full) = **61% reduction** - cpu\_master: Width converters for 64b→512b upsize (both wr and rd converters) - ddr\_controller: Direct 512b connection (no conversion) - apb\_periph0: APB converter placeholder (Phase 3)

## Common User Questions

**Q: “How do I generate a bridge?”**

**A: Three steps:**

1. Create TOML port configuration file
2. Create CSV connectivity matrix
3. Run bridge\_generator.py

```
# Create bridge_mybridge.toml (port configuration)  
# Create bridge_mybridge_connectivity.csv (connectivity matrix)
```

```
cd projects/components/bridge/bin  
python3 bridge_generator.py --ports test_configs/bridge_mybridge.toml  
# Auto-finds bridge_mybridge_connectivity.csv
```

```
# Or use bulk generation for multiple bridges  
python3 bridge_generator.py --bulk bridge_batch.csv
```

**Q: “What’s the difference between wr/rd/rw channels?”**

**A: Number of AXI4 channels generated:**

- **rw** (read/write) - All 5 channels: AW, W, B, AR, R
- **wr** (write-only) - 3 channels: AW, W, B (no read channels)
- **rd** (read-only) - 2 channels: AR, R (no write channels)

**Benefits:** 40-60% fewer ports for dedicated masters, less logic, faster synthesis

**Q: “Can I add timing isolation to ports?”**

**A: Yes, use interface configuration:**

```
masters = [
    {name = "cpu", prefix = "cpu_m_axi", ...,
     interface = {type = "axi4_master", skid_depths = {ar = 2, r = 4, aw
= 2, w = 4, b = 2}}}
]
```

Available interface types: - "axi4\_master" - Timing isolation on master port - "axi4\_slave" - Timing isolation on slave port - "axi4\_master\_mon" - Timing + monitoring - "axi4\_slave\_mon" - Timing + monitoring - Omit interface field for direct connection

**Q: “Can I mix AXI4 and APB slaves?”**

**A: Yes, use protocol field in TOML:**

```
slaves = [
    {name = "ddr", prefix = "ddr_s_axi", protocol = "axi4", ...},
    {name = "apb0", prefix = "apb0_", protocol = "apb", ...}
]
```

Generator inserts AXI2APB converters automatically (Phase 3 for full implementation).

**Q: “What if data widths don’t match?”**

**A: Generator inserts width converters automatically:**

```
masters = [
    {name = "cpu", data_width = 64, ...}, # 64b master
]
slaves = [
    {name = "ddr", data_width = 512, ...} # 512b slave
]
# Generator creates width converter: 64b → 512b
```

**Q: “Do all masters need to connect to all slaves?”**

**A: No, use partial connectivity in CSV:**

```
master\slave,ddr,sram,apb
rapids_descr,1,1,0      # Connects to ddr and sram only
cpu,1,1,1                # Connects to all three
```

## Generator Output Structure

**Generated File Contains:**

1. **Module Header** - Parameterized with NUM\_MASTERS, NUM\_SLAVES, widths
2. **Port Declarations** - Custom prefix per port, channel-specific signals
3. **Internal Signals** - Crossbar interface arrays (`xbar_m_`, `xbar_s_`)
4. **Width Converters** - Master-side upsize instances (channel-aware)
5. **Crossbar Instance** - Internal AXI4 full crossbar
6. **Direct Connections** - For matching-width interfaces
7. **APB Converters** - Placeholder TODO comments (Phase 3)

**Example Output Size:** - Simple 2x2 bridge: ~400 lines - Complex 5x3 with mixed protocols: ~900 lines - Includes comprehensive comments and structure

## Testing Generated Bridges

**For Pure AXI4 Bridges (Working Now):**

```
# Generate bridge
python3 bridge_csv_generator.py --ports ports.csv --connectivity
conn.csv --name my_bridge --output ../rtl/


---


# Create testbench (use BridgeAXI4FlatTB from dv/tbclasses/)
cd ../dv/tests/fub_tests/basic
pytest test_my_bridge.py -v
```

**For Mixed AXI4/APB (Requires Phase 3):** APB converter placeholders need implementation before end-to-end testing.

---

## Bridge Architecture Quick Reference

### Generated Bridge Crossbar Structure

```
module bridge_axi4_flat_2x2 #(
    parameter NUM_MASTERS = 2,
    parameter NUM_SLAVES = 2,
    parameter DATA_WIDTH = 32,
    parameter ADDR_WIDTH = 32,
    parameter ID_WIDTH = 4
) (
    input logic aclk,
    input logic aresetn,
    // Master-side interfaces (slave ports on bridge)
    // AW, W, B, AR, R channels for each master
    // Slave-side interfaces (master ports on bridge)
```

```
// AW, W, B, AR, R channels for each slave  
);
```

## Key Features

1. **5-Channel Implementation:** Complete AW, W, B, AR, R routing
2. **Round-Robin Arbitration:** Per-slave arbitration with grant locking
3. **Transaction ID Tracking:** Distributed RAM for B/R response routing
4. **ID-Based Routing:** Enables out-of-order responses
5. **Configurable Parameters:** NxM topology, data/addr/ID widths

## Address Map

Default address map (configurable): - Slave 0: 0x00000000 - 0x0FFFFFFF (256MB) - Slave 1: 0x10000000 - 0x1FFFFFFF (256MB) - Slave N: N \* 0x10000000

---

## Test Organization

### Test Hierarchy

```
projects/components/bridge/dv/tests/  
└── conftest.py           # Pytest configuration with fixtures  
└── fub_tests/            # Functional unit block tests  
    └── basic/  
        ├── test_bridge_axi4_2x2.py      # Basic 2x2 tests  
        ├── test_bridge_axi4_4x4.py      # Full 4x4 tests  
        └── test_bridge_axi4_routing.py  # Routing tests  
└── integration_tests/      # Multi-bridge scenarios  
    └── test_bridge_cascade.py  
└── system_tests/          # Full system tests  
    └── test_bridge_dma.py
```

### Test Levels

**Basic (FUB) Tests:** - Individual bridge functionality - Address routing - ID tracking - Arbitration

**Integration Tests:** - Multi-bridge cascades - Complex topologies - Cross-bridge transactions

**System Tests:** - Full DMA transfers - Realistic traffic patterns - Performance validation

---

## Common User Questions and Responses

**Q: "How does the Bridge work?"**

**A: Direct answer:**

The Bridge AXI4 crossbar connects multiple AXI4 masters to multiple slaves:

1. **Address Decode (AW/AR):** Routes master requests to appropriate slave based on address
2. **Write Path:** AW → arbitration → slave, W follows locked grant
3. **Read Path:** AR → arbitration → slave, R returns via ID table
4. **Arbitration:** Per-slave round-robin with grant locking until burst complete
5. **Response Routing:** B/R responses use ID lookup tables (not grant-based)

**Key Features:** - Out-of-order response support via ID tracking - Burst-aware arbitration (grant locked until WLAST/RLAST) - Configurable NxM topology - Single-clock domain

 **See:** - projects/components/bridge/PRD.md - Complete specification - projects/components/bridge/bin/bridge\_generator.py - Generator implementation

**Q: "How do I generate a Bridge?"**

**A: Use the bridge\_generator.py tool:**

```
cd projects/components/bridge/bin
python bridge_generator.py --masters 2 --slaves 4 --data-width 32 --
addr-width 32 --id-width 4 --output ../rtl/bridge_axi4_flat_2x4.sv
```

**Generated files:** - RTL:

projects/components/bridge/rtl/bridge\_axi4\_flat\_<config>.sv - Contains complete 5-channel crossbar with ID tracking

**Q: "How do I test a Bridge?"**

**A: Use the Bridge testbench class:**

```
# Import from project area
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__),
'../../../../'))
from projects.components.bridge.dv.tbclasses.bridge_axi4_flat_tb
import BridgeAXI4FlatTB
```

```

@cocotb.test()
async def test_basic(dut):
    tb = BridgeAXI4FlatTB(dut, num_masters=2, num_slaves=2)
    await tb.setup_clocks_and_reset()

    # Send write to slave 0
    await tb.write_transaction(master_idx=0, address=0x00001000,
data=0xDEADBEEF)

    # Verify routing
    assert len(tb.aw_slaves[0]._recvQ) > 0, "Slave 0 should receive AW"

```

**Run tests:**

```

cd projects/components/bridge/dv/tests/fub_tests/basic
pytest test_bridge_axi4_2x2.py -v

```

---

## Anti-Patterns to Avoid

### X Anti-Pattern 1: Embedded Testbench Classes

✗ WRONG: TB **class** in test file

```

class BridgeTB:
    """NOT REUSABLE - WRONG LOCATION"""

```

✓ CORRECT: Import **from** project area

```

import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__),
'../../../../'))
from projects.components.bridge.dv.tbclasses.bridge_axi4_flat_tb
import BridgeAXI4FlatTB

```

### X Anti-Pattern 2: Manual AXI4 Handshaking

✗ WRONG: Manual signal driving

```

self.dut.s0_axi4_awvalid.value = 1
while self.dut.s0_axi4_awready.value == 0:
    await RisingEdge(self.clock)

    ✓ CORRECT: Use GAXI components
    await self.aw_master.send(aw_pkt)

```

### X Anti-Pattern 3: Memory Models for Simple Tests

✗ WRONG: Unnecessary complexity

```

memory = MemoryModel()
memory.write(addr, data)
result = memory.read(addr)

```

```
✓ CORRECT: Direct queue verification
aw_pkt = self.aw_slave._recvQ.popleft()
assert aw_pkt.addr == expected_addr
```

---

## Quick Reference

### Finding Existing Components

```
# Bridge TB class (in PROJECT AREA, not framework!)
cat projects/components/bridge/dv/tbclasses/bridge_axi4_flat_tb.py
```

```
# Bridge generator
cat projects/components/bridge/bin/bridge_generator.py
```

```
# Test examples
ls projects/components/bridge/dv/tests/fub_tests/basic/
```

### Common Commands

```
# Generate 2x2 bridge
python projects/components/bridge/bin/bridge_generator.py --masters 2
--slaves 2 --output
projects/components/bridge/rtl/bridge_axi4_flat_2x2.sv
```

```
# Run tests
pytest projects/components/bridge/dv/tests/fub_tests/basic/ -v
```

```
# Lint generated RTL
verilator --lint-only
projects/components/bridge/rtl/bridge_axi4_flat_2x2.sv
```

---

## Remember

1.  **MANDATORY: Testbench architecture** - TB classes in framework, tests import them
2.  **MANDATORY: Directory structure** - Follow RAPIDS/AMBA pattern exactly
3.  **MANDATORY: conftest.py** - Must exist in dv/tests/
4.  **Use GAXI components** - Never manually drive AXI4 handshakes
5.  **Queue-based verification** - Simple tests use direct queue access
6.  **Three-layer architecture** - TB (framework) + Test (runner) + Scoreboard (verification)

7. **Three mandatory methods** - setup\_clocks\_and\_reset, assert\_reset, deassert\_reset
  8. **Search first** - Use existing components before creating new ones
  9. **Test scalability** - Support basic/medium/full test levels
  10. **100% success** - All tests must achieve 100% success rate
- 

## PDF Generation Location

**IMPORTANT: PDF files should be generated in the docs directory:**

/mnt/data/github/rtldesignsherpa/projects/components/bridge/docs/

**Quick Command:** Use the provided shell script:

```
cd /mnt/data/github/rtldesignsherpa/projects/components/bridge/docs  
./generate_pdf.sh
```

The shell script will automatically: 1. Use the md\_to\_docx.py tool from bin/ 2. Process the bridge\_spec index file 3. Generate both DOCX and PDF files in the docs/ directory 4. Create table of contents and title page

**See:** bin/md\_to\_docx.py for complete implementation details

---

**Version:** 1.0 **Last Updated:** 2025-10-18 **Maintained By:** RTL Design Sherpa Project

## Bridge Component - Task List

**Component:** Bridge (AXI4 Crossbar Generators) **Last Updated:** 2025-11-04

**Version:** 2.2 **Current Status:** Major Milestones Complete - Wrapper Integration + Intelligent Routing (All Phases)

---

## Quick Status

**Generators:** - ✓ bridge\_generator.py - Framework-based MxN crossbar (Complete) - ✓ bridge\_csv\_generator.py - CSV-configured crossbar with channel-specific masters (Phase 2 Complete) - ✓ bridge\_model.py - Performance modeling V1 Flat (Complete)

**Phases:** - ✓ Phase 1: CSV Configuration (Complete 2025-10-25) - ✓ Phase 2: Channel-Specific Masters (Complete 2025-10-26) - ✓ Phase 3: APB Converter Integration (Complete 2025-11-10)

**Major Architectural Features:** - ✓ TASK-000A: AXI4 Interface Wrapper Integration (Complete 2025-11-04) - ✓ TASK-000: Intelligent Width-Aware Routing - All 4 Phases (Complete 2025-11-04) - Zero-latency direct paths for matching widths - Intelligent converter placement (only where needed) - Per-slave arbitration with burst locking - 60% zero-latency connections in mixed-width configs

 **See also:** - BRIDGE\_CURRENT\_STATE.md - Detailed current state review - IMPLEMENTATION\_STATUS.md - Phase status summary - TESTING.md - Test compliance documentation

---

## Task Status Legend

-  **Blocked** - Cannot proceed due to dependencies or issues
-  **In Progress** - Currently being worked on
-  **Planned** - Scheduled for upcoming work
-  **Complete** - Finished and verified

## Priority Levels

- **P0** - Critical path, blocks other work
  - **P1** - High priority, needed for current phase
  - **P2** - Medium priority, nice to have
  - **P3** - Low priority, future enhancement
- 

## Active Tasks (Phase 3 and Beyond)

### TASK-000A: AXI4 Interface Wrapper Integration (CRITICAL)

**Status:**  **COMPLETE** (2025-11-04) **Priority:** P0 (Foundational Architecture)

**Effort:** 2-3 weeks (Completed) **Owner:** Claude Code AI

**Description:** Replace raw AXI4 signal ports with standard AXI4 interface wrappers for proper buffering, timing, and optional monitoring.

**Current State (WRONG):**

```

// Bridge has raw AXI signals at boundaries
module bridge_1x2_rd (
    input logic [63:0] cpu_m_axi_araddr,
    input logic         cpu_m_axi_arvalid,
    output logic        cpu_m_axi_arready,
    // ... dozens of raw signals
);

```

### Target Architecture (REQUIRED):

```

// Bridge uses standard interface wrappers
module bridge_1x2_rd (
    input logic aclk,
    input logic aresetn,

    // Master interfaces use axi4_master_rd/wr wrappers
    // (These provide buffering, timing closure, optional monitoring)

    // Slave interfaces use axi4_slave_rd/wr wrappers
);

// Inside bridge: Instantiate interface wrappers
axi4_master_rd #(
    .SKID_DEPTH_AR(2),
    .SKID_DEPTH_R(2),
    .AXI_ID_WIDTH(8),
    .AXI_ADDR_WIDTH(64),
    .AXI_DATA_WIDTH(64)
) u_cpu_master_rd (
    .aclk(aclk), .aresetn(aresetn),
    // fub_axi_* connects to external port
    // m_axi_* connects to internal crossbar
);

axi4_slave_rd #(...) u_ddr_slave_rd (...);
axi4_slave_rd #(...) u_sram_slave_rd (...);

```

**Why This Matters:** 1. **Buffering:** Built-in skid buffers improve timing closure

2. **Standard Interfaces:** Aligns with AMBA infrastructure (rtl/amba/axi4/)

3. **Optional Monitoring:** Can enable \*\_mon.sv versions for debug/performance

4. **Reduced Integration Errors:** Standard wrappers vs error-prone raw signals

5. **Future-Proof:** Ready for monitor integration (TASK-020)

**Required Wrappers:** - ✓ axi4\_master\_wr.sv - Master write interface (existing) -  
✓ axi4\_master\_rd.sv - Master read interface (existing) - ✓ axi4\_slave\_wr.sv -  
Slave write interface (existing) - ✓ axi4\_slave\_rd.sv - Slave read interface  
(existing)

**Optional Monitor Versions (Phase 2 of this task):** - ✓ `axi4_master_wr_mon.sv` - With monitoring (existing) - ✓ `axi4_master_rd_mon.sv` - With monitoring (existing) - ✓ `axi4_slave_wr_mon.sv` - With monitoring (existing) - ✓ `axi4_slave_rd_mon.sv` - With monitoring (existing)

### **Implementation Complete:**

**Phase 1: Standard Wrappers ✓ COMPLETE** 1. ✓ Modified generator to instantiate interface wrappers 2. ✓ Port list generation (external `s_axi_*` boundary, internal `fub_axi_*`) 3. ✓ Channel-specific wrapper handling (wr/rd/rw) - wr: Only `axi4_slave_wr` wrapper (write channels) - rd: Only `axi4_slave_rd` wrapper (read channels) - rw: Both `axi4_slave_wr` and `axi4_slave_rd` wrappers 4. ✓ Configurable skid buffer depths (`SKID_DEPTH_AW`, `SKID_DEPTH_W`, etc.) 5. ✓ CSV generator fully updated 6. ✓ All test bridges regenerated with wrapper architecture 7. ✓ Testbench infrastructure compatible 8. ✓ Full regression testing complete (10/10 tests passing)

**Phase 2: Optional Monitoring (Future Enhancement)** - Monitoring support available via `*_mon.sv` variants - Not required for current Phase 2 completion - See TASK-020 for detailed monitoring integration plan

**Verification Evidence:** - ✓ Generated adapters use `axi4_slave_wr`/`axi4_slave_rd` wrappers (`cpu_master_adapter.sv`:205, :279) - ✓ Timing isolation: external `s_axi_*` vs internal `fub_axi_*` signals - ✓ Configurable skid buffers with depth parameters - ✓ Channel-specific instantiation (wr masters only get wr wrapper) - ✓ All 10 bridge tests pass with wrapper architecture (100% success rate)

**See Generated Examples:** - `projects/components/bridge/rtl/generated/bridge_4x4_rw/cpu_master_adapter.sv` - Complete wrapper integration - Lines 205-274: Write wrapper instantiation - Lines 279-348: Read wrapper instantiation

**Dependencies:** - Interface wrappers exist in `rtl/amba/axi4/` (✓ Available) - Test infrastructure works (✓ 16/16 passing baseline)

**Impact:** - **Timing:** Better timing closure from skid buffers - **Debugging:** Optional monitoring for error detection - **Maintainability:** Standard interfaces reduce integration errors - **Performance:** Buffering improves throughput under backpressure

**Related Tasks:** - TASK-020: Optional Monitor Integration (builds on this) - TASK-011: Replace Hand-Coded Arbiters (orthogonal)

**See also:** - [rtl/amba/axi4/axi4\\_master\\_rd.sv](#) - Read interface wrapper -  
- [rtl/amba/axi4/axi4\\_master\\_wr.sv](#) - Write interface wrapper -  
- [rtl/amba/CLAUDE.md](#) - AMBA interface patterns -  
- [docs/AXI\\_Monitor\\_Configuration\\_Guide.md](#) - Monitoring setup

---

## TASK-000: Intelligent Width-Aware Routing Architecture

**Status:**  **COMPLETE** - All 4 Phases Including Per-Slave Arbitration (2025-11-04)

**Priority:** P0 (Architecture Foundation) **Effort:** All phases complete **Owner:**

Claude Code AI **Completed:** 2025-11-04 (All phases) **Documentation:**  
[INTELLIGENT\\_ROUTING\\_STATUS.md](#)

**Description:** Rearchitect bridge generator to use intelligent per-master multi-width routing instead of fixed-width crossbar with double conversions.

### Target Architecture (IMPLEMENTED):

```
Master(64b) → Router → {  
    Direct Path → Slave_A(64b)           [0 conversions, ZERO latency]  
    Conv(64→128) → Slave_B(128b)        [1 conversion]  
    Conv(64→512) → Slave_C(512b)        [1 conversion]  
}
```

**Benefits Achieved:** - ✓ **60% zero-latency** direct connections in mixed-width configs - ✓ **80% fewer converters** (2 vs 10 in 4x4 example) - ✓ **Full native bandwidth** on all paths

### Phase 1: Foundation ✓ COMPLETE (2025-10-30)

1. ✓ **Routing Table Analysis**
  - `build_routing_table()` - Groups slaves by width for each master
  - `MasterRoutingPath` / `MasterRoutingInfo` data structures
  - Routing efficiency metrics (60% zero-latency for 4x4 test)
2. ✓ **Per-Path Signal Generation**
  - `generate_per_path_signals_v2()` - Creates signals at native slave widths
  - Signal naming: `m{master_idx}_path{width}b_{signal}`
  - Example: `m0_path64b_wdata`, `m0_path128b_wdata`
3. ✓ **Direct Connection Logic**
  - `generate_path_connections_v2()` - Zero-latency assigns for matching widths

- Converter TODO placeholders for mismatched widths
- 4. ✓ **Testing and Verification**
  - 2x2 config: 100% zero-latency (all matching widths)
  - 4x4 config: 60% zero-latency (3/5 direct, 2/5 converted)

## Phase 2: Address Decode ✓ COMPLETE (2025-10-30)

1. ✓ **Generate Address Decode Logic**
  - ✘ Decode master address to determine target slave (per-slave match logic)
  - ✘ Map slave to appropriate output path (slave-to-path case statement)
  - ✘ Generate path selection muxes (one-hot path selection)
  - ✘ Conditional routing based on address decode (AW/AR awvalid/arvalid)
  - ✘ Ready mux from selected path back to master
2. ✓ **Testing:**
  - ✘ Verified address decode generates correct RTL for 2x2 config
  - ✘ Verified multi-path routing for 4x4 mixed-width config
  - ✘ Confirmed cpu\_master (2 paths) address-selects correct path

**Phase 2 Limitations (To be addressed in Phases 3-4):** - W channel broadcasts to all paths (needs AW path tracking) - B/R channels OR all responses (needs transaction ID tracking) - No per-slave arbitration yet

## Phase 3: Converter Instantiation ✓ COMPLETE (2025-10-30)

1. ✓ **Replace TODO Placeholders**
  - ✘ Instantiate axi4\_dwidth\_converter\_wr/rd for converted paths
  - ✘ Connect external master → converter → path signals
  - ✘ Handle channel-specific converters (wr/rd only)
  - ✘ Fix routing mux / converter integration (no drive conflicts)
2. ✓ **Testing:**
  - ✘ Verified upsize converters (64→128, 64→256, 64→512)
  - ✘ Verified downsize converters (256→128)
  - ✘ 4x4 config generates with 2 wr + 2 rd converters
  - ✘ 5x3 config generates with 4 converters (2 paths, 2 converters each)

**Implementation Details:** - generate\_converter\_instance() creates axi4\_dwidth\_converter\_wr/rd instances - Routing muxes skip converter paths to

avoid drive conflicts - Both write and read converters for rw channel type - Write-only or read-only converters for wr/rd channel types

#### Phase 4: Per-Slave Arbitration ✓ COMPLETE (2025-11-04)

1. ✓ **Arbitration Logic Generation**
  - ✘ Generate per-slave request/grant vectors
  - ✘ Instantiate round-robin arbiters per slave
  - ✘ Generate slave-side muxes (selected master path → slave)
  - ✘ Grant locking until burst completion (WLAST/RLAST tracking)
2. ✓ **Multi-Master Testing:**
  - ✘ Multiple masters accessing same slave (verified in generated RTL)
  - ✘ Arbitration fairness (round-robin implementation)
  - ✘ No transaction corruption (ID-based response routing)
  - ✘ Test with mixed-width topologies (4x4 config: 64b/128b/256b/512b)
  - ✘ Resource efficiency: intelligent routing eliminates unnecessary converters
  - ✘ Performance validation: direct paths have zero conversion latency

#### Complete Architecture Verification:

All 4 phases have been verified in the generated RTL:

```
// Example: cpu_master_adapter.sv (64b master → 4 slaves: 32b, 64b, 128b, 256b)
```

```
// Phase 1: Per-Path Signal Generation ✓
logic [31:0] cpu_master_32b_awaddr; // Path for 32b slave
logic [63:0] cpu_master_64b_awaddr; // Path for 64b slave
logic [127:0] cpu_master_128b_awaddr; // Path for 128b slave
logic [255:0] cpu_master_256b_awaddr; // Path for 256b slave

// Phase 2: Address Decode Logic ✓
always_comb begin
    slave_select_aw = '0;
    if (fub_axi_awaddr <= 32'h3fffffff)
        slave_select_aw[0] = 1'b1; // Slave 0 (32b)
    else if (fub_axi_awaddr >= 32'h40000000 && fub_axi_awaddr <=
32'h7fffffff)
        slave_select_aw[1] = 1'b1; // Slave 1 (64b) - DIRECT!
    // ... more slaves ...
end

// Phase 3: Converter Instantiation ✓
```

```

axi4_dwidth_converter_wr #(.S_WIDTH(64), .M_WIDTH(32)) u_conv_wr_32b
(...);
// Direct passthrough for matching width (64→64) - NO CONVERTER!
axi4_dwidth_converter_wr #(.S_WIDTH(64), .M_WIDTH(128)) u_conv_wr_128b
(...);
axi4_dwidth_converter_wr #(.S_WIDTH(64), .M_WIDTH(256)) u_conv_wr_256b
(...);

// Phase 4: Response Routing with Arbitration ✓
case (slave_select_aw)
    4'b0001: fub_axi_awready = conv_32b_awready; // Via converter
    4'b0010: fub_axi_awready = cpu_master_64b_awready; // Direct
path!
    4'b0100: fub_axi_awready = conv_128b_awready; // Via converter
    4'b1000: fub_axi_awready = conv_256b_awready; // Via converter
endcase

```

**Key Achievement:** 64b→64b path has ZERO conversion latency (direct assign)!

**Dependencies:** - Current Phase 2 generator (✓ provides baseline) - Width converters exist (✓ axi4\_dwidth\_converter\_wr/rd)

**Benefits:** - 20-40% area reduction for mixed-width bridges - 0-cycle conversion overhead for matching widths - Better bandwidth utilization - Scalable to any width combination - Aligns with hardware efficiency best practices

**Related Files:** - bin/bridge\_csv\_generator.py (major rework) - bin/bridge\_channel\_router.py (update for multi-width) - bin/bridge\_address\_arbiter.py (width-aware routing)

**See also:** - CLAUDE.md Section “Target Architecture: Intelligent Width-Aware Routing”

---

## TASK-001: Phase 3 - APB Converter Implementation

**Status:** **Planned Priority:** P0 **Effort:** 5-6 weeks **Owner:** Unassigned

**Description:** Implement AXI4-to-APB converter module to complete Phase 3 of CSV bridge generator.

**Acceptance Criteria:** - [ ] Create axi4\_to\_apb\_converter.sv module - [ ] Implement AXI4 burst → APB sequential transaction splitting - [ ] Handle backpressure propagation (APB → AXI4 ARREADY/AWREADY) - [ ] Implement error mapping (PSLVERR → RRESP/BRESP) - [ ] Add width conversion support (if needed) - [ ] Integrate converter instantiation in bridge\_csv\_generator.py - [ ] Create

comprehensive CocoTB testbench - [ ] End-to-end testing with mixed AXI4/APB bridges

**Dependencies:** - Phase 2 complete (✓) - APB BFM available in CocoTBFramework (✓)

**Related Files:** - rtl/converters/axi4\_to\_apb\_converter.sv (to be created) - bin/bridge\_csv\_generator.py (update converter instantiation) - dv/tbclasses/apb\_converter\_tb.py (to be created)

**See also:** - docs/AXI4\_AXIL4\_CONVERTER\_ANALYSIS.md - Similar converter analysis

---

### TASK-002: Channel-Specific Master End-to-End Testing

**Status:** ● Planned **Priority:** P1 **Effort:** 1 week **Owner:** Unassigned

**Description:** Create comprehensive tests to verify Phase 2 channel-specific master implementation (wr/rd/rw).

**Acceptance Criteria:** - [ ] Verify wr-only masters have NO read channels generated - [ ] Verify rd-only masters have NO write channels generated - [ ] Verify rw masters have all 5 channels - [ ] Test signal count matches expectations (37 wr, 24 rd, 61 rw) - [ ] End-to-end write-only master test (stimulus → transaction → verification) - [ ] End-to-end read-only master test (stimulus → transaction → verification) - [ ] Mixed master topology test (wr + rd + rw in same bridge) - [ ] Verify resource savings in synthesis reports

**Dependencies:** - Phase 2 complete (✓) - BridgeAXI4FlatTB exists (✓)

**Related Files:** - dv/tests/test\_bridge\_channel\_specific.py (to be created) - dv/tbclasses/bridge\_axi4\_flat\_tb.py (may need channel-aware methods)

---

### TASK-003: Width Converter Integration Testing

**Status:** ● Planned **Priority:** P1 **Effort:** 1 week **Owner:** Unassigned

**Description:** Verify width converter instantiation and end-to-end operation in generated bridges.

**Acceptance Criteria:** - [ ] Test upsize converter (64b master → 512b crossbar) - [ ] Test downsize converter (512b master → 256b crossbar) - [ ] Verify burst handling through converters - [ ] Test backpressure propagation through converters - [ ]

Verify data integrity (write → read → compare) - [ ] Test channel-specific converters (wr-only needs wr converter only) - [ ] Performance characterization with converters

**Dependencies:** - axi4\_dwidth\_converter\_wr.sv exists (✓) - axi4\_dwidth\_converter\_rd.sv exists (✓)

**Related Files:** - dv/tests/test\_bridge\_width\_conversion.py (to be created) - rtl/converters/axi4\_dwidth\_converter\_\*.sv (existing)

---

#### **TASK-004: CSV Bridge Generator Documentation**

**Status:**  **Planned Priority:** P1 **Effort:** 3 days **Owner:** Unassigned

**Description:** Create comprehensive user guide for CSV bridge generator usage.

**Acceptance Criteria:** - [ ] CSV file format specification (ports.csv, connectivity.csv) - [ ] Channel-specific master guide (when to use wr/rd/rw) - [ ] Width converter configuration examples - [ ] APB integration guide (when Phase 3 complete) - [ ] Complete worked examples (RAPIDS-style, SoC integration) - [ ] Troubleshooting guide (common errors, solutions) - [ ] Performance implications of configuration choices

**Dependencies:** - Phase 2 complete (✓)

**Related Files:** - docs/CSV\_BRIDGE\_GENERATOR\_GUIDE.md (to be created) - CSV\_BRIDGE\_STATUS.md (existing, needs user-facing companion)

---

#### **TASK-005: Performance Characterization**

**Status:**  **Planned Priority:** P2 **Effort:** 2 days **Owner:** Unassigned

**Description:** Characterize and document crossbar performance (latency, throughput, resource usage).

**Acceptance Criteria:** - [ ] Measure crossbar routing latency (master request → slave response) - [ ] Measure arbitration latency (multiple masters contending) - [ ] Document throughput with/without width converters - [ ] Measure FPGA resource usage for different topologies (2x2, 4x4, 8x8) - [ ] Compare channel-specific vs full master resource usage - [ ] Document Fmax for different

configurations - [ ] Create performance model validation (compare to bridge\_model.py)

**Dependencies:** - bridge\_model.py V1 complete (✓)

**Related Files:** - docs/BRIDGE\_PERFORMANCE\_ANALYSIS.md (to be created) - bin/bridge\_model.py (existing)

---

## TASK-006: Address Decode Documentation

**Status:** ● Planned **Priority:** P2 **Effort:** 1 day **Owner:** Unassigned

**Description:** Document address decode algorithm and configuration in detail.

**Acceptance Criteria:** - [ ] Explain address decode logic (base\_addr + addr\_range)  
- [ ] Document address overlap detection - [ ] Add examples of different address maps  
- [ ] Include address width mismatch handling - [ ] Document decode error reporting (no slave match) - [ ] CSV address map configuration guide

**Dependencies:** - None

**Related Files:** - docs/ADDRESS\_DECODE\_GUIDE.md (to be created)

---

## TASK-007: Error Handling and Response Routing Documentation

**Status:** ● Planned **Priority:** P2 **Effort:** 1 day **Owner:** Unassigned

**Description:** Document error propagation and response routing in crossbar architecture.

**Acceptance Criteria:** - [ ] Document AXI4 BRESP/RRESP error types (OKAY, EXOKAY, SLVERR, DECERR) - [ ] Explain decode error generation (no slave match)  
- [ ] Document slave error propagation to requesting master - [ ] Explain ID-based response routing (B/R channels) - [ ] Add examples of error scenarios - [ ] Document timeout handling (if implemented)

**Dependencies:** - None

**Related Files:** - docs/ERROR\_HANDLING\_GUIDE.md (to be created)

---

## TASK-008: Wavedrom Timing Diagrams

**Status:**  **Planned Priority:** P2 **Effort:** 2 days **Owner:** Unassigned

**Description:** Create wavedrom JSON files illustrating crossbar operation.

**Acceptance Criteria:** - [ ] Create crossbar\_write\_flow.json (AW → W → B path) - [ ] Create crossbar\_read\_flow.json (AR → R path) - [ ] Create arbitration\_example.json (multi-master contention) - [ ] Create id\_routing\_example.json (out-of-order responses) - [ ] Create width\_conversion.json (converter operation) - [ ] Generate SVG/PNG from all JSON files - [ ] Place in docs/bridge\_spec/assets/waves/

**Dependencies:** - None

**Related Files:** - docs/bridge\_spec/assets/waves/\*.json

---

## TASK-009: PlantUML Architecture Diagrams

**Status:**  **Planned Priority:** P2 **Effort:** 1 day **Owner:** Unassigned

**Description:** Create PlantUML architecture diagrams complementing ASCII diagrams in BRIDGE\_ARCHITECTURE\_DIAGRAMS.md.

**Acceptance Criteria:** - [ ] Create crossbar\_architecture.puml (overall structure) - [ ] Create arbitration\_logic.puml (per-slave arbiter) - [ ] Create id\_tracking\_cam.puml (CAM lookup flow) - [ ] Create width\_converter\_integration.puml (converter placement) - [ ] Generate PNG/SVG from all PUML files - [ ] Place in docs/bridge\_spec/assets/puml/

**Dependencies:** - None

**Related Files:** - docs/bridge\_spec/assets/puml/\*.puml

---

## TASK-010: Synthesis and Implementation Guide

**Status:**  **Planned Priority:** P2 **Effort:** 2 days **Owner:** Unassigned

**Description:** Create guide for synthesizing and implementing generated bridges.

**Acceptance Criteria:** - [ ] Document synthesis flow (Vivado, Quartus) - [ ] Provide timing constraint examples - [ ] Document resource utilization expectations - [ ]

Add Fmax optimization tips - [ ] Include floorplanning guidance for large crossbars - [ ] Document power optimization strategies

**Dependencies:** - None

**Related Files:** - docs/SYNTHESIS\_GUIDE.md (to be created)

---

## Recently Completed Tasks

### ✓ Phase 2: Channel-Specific Masters (Complete - 2025-10-26)

**Impact:** 35-60% resource reduction for dedicated masters - Implemented wr-only master type (write channels: AW, W, B only) - Implemented rd-only master type (read channels: AR, R only) - Updated CSV parser to support 'channels' field (wr/rd/rw) - Modified RTL generation to emit only needed channels - Updated width converter instantiation (channel-aware) - Created channel-specific direct connection logic - Verified with test bridges (bridge\_1x2\_wr, bridge\_1x2\_rd, bridge\_2x2\_rw) - Updated documentation (CSV\_BRIDGE\_STATUS.md)

**Resource Savings Example:** - wr-only master: 37 signals (vs 61 for rw) = 39% reduction - rd-only master: 24 signals (vs 61 for rw) = 61% reduction - 4-master bridge (2wr + 1rd + 1rw): 159 signals (vs 244 for all rw) = 35% overall

### ✓ Phase 1: CSV Configuration System (Complete - 2025-10-25)

- Implemented CSV parser (ports.csv, connectivity.csv)
- Created PortSpec and SlaveConfig dataclasses
- Implemented custom port prefix generation
- Added width converter identification logic
- Created internal crossbar instantiation
- Generated working test bridges (bridge\_2x2\_rw, bridge\_4x4\_rw)
- Basic integration testing passed

### ✓ Framework-Based Generator (Complete - 2025-10-18)

- Implemented bridge\_generator.py with hierarchical modules
- Created BridgeAmaIntegrator, BridgeAddressArbiter, BridgeChannelRouter, BridgeResponseRouter
- Generated standard bridges (bridge\_axi4\_flat\_2x2.sv, bridge\_axi4\_flat\_4x4.sv)
- Array-indexed port naming (s\_axi\_awaddr[**NUM\_MASTERS**])

- AMBA component integration
- Functional verification complete

✓ **Performance Modeling (Complete - 2025-10-15)**

- Implemented bridge\_model.py with V1 Flat architecture
- Created BridgePerformanceModelV1Flat class
- Latency modeling for different topologies
- Resource estimation
- Validation against RTL simulation

✓ **Documentation Update (Complete - 2025-10-29)**

- Created BRIDGE\_CURRENT\_STATE.md (comprehensive review)
  - Created BRIDGE\_ARCHITECTURE\_DIAGRAMS.md (visual architecture)
  - Updated PRD.md to Version 2.0 (Phase 2 status)
  - Updated CLAUDE.md to Version 2.0 (current references)
  - Updated IMPLEMENTATION\_STATUS.md (complete rewrite, Phase 2 status)
  - Created DOCUMENTATION\_UPDATE\_SUMMARY.md
  - Created AXI4\_AXIL4\_CONVERTER\_ANALYSIS.md
  - Updated TESTING.md (test compliance standard)
- 

## Future Enhancements (Backlog)

### TASK-011: Replace Hand-Coded Arbiters with Standard Components

**Priority: P1 Description:** Replace hand-coded round-robin arbiters with arbiter\_round\_robin.sv from rtl/common/. **Rationale:** - Current Bridge has inline arbitration logic (~40 lines per arbiter) - arbiter\_round\_robin.sv with WAIT\_GNT\_ACK=1 is perfect for valid/ready handshaking - Proven component (95% test coverage) - Easier maintenance and consistent with design standards - Clear migration path to QoS via arbiter\_round\_robin\_weighted.sv

**Implementation:** - Replace AW/AR per-slave arbiters with arbiter\_round\_robin instances - Set WAIT\_GNT\_ACK=1 to hold grant until handshake completes - Update Bridge CSV generator to emit arbiter instantiations - Test with existing Bridge testbenches **Impact:** - Reduces code complexity (~40 lines → ~15 lines per arbiter) - Enables future QoS enhancement (TASK-017)

## **TASK-012: AXI Burst Optimization**

**Priority:** P3 **Description:** Optimize AXI burst transaction handling by pipelining APB accesses.

## **TASK-013: Outstanding Transaction Support**

**Priority:** P3 **Description:** Add support for multiple outstanding transactions (requires buffering).

## **TASK-013: Timeout Detection**

**Priority:** P3 **Description:** Implement configurable timeout counters for hung transactions.

## **TASK-014: APB3 to APB4 Bridge**

**Priority:** P3 **Description:** Create bridge between APB3 and APB4 protocol versions.

## **TASK-015: AXI4 ➔ AXI4-Lite Converter Implementation**

**Priority:** P3 **Effort:** 12-14 weeks **Description:** Create bidirectional AXI4 ➔ AXI4-Lite protocol converters for integration with CSV bridge generator.

**Phases:** - **Phase 3A:** AXI4-Lite → AXI4 upconvert (2 weeks) - Add default signals (ID, burst, etc.) - **Phase 3B:** AXI4 → AXI4-Lite downconvert (5-6 weeks) - Burst splitting, ID removal, response aggregation - **Integration:** CSV generator integration (2-3 weeks) - **Testing:** End-to-end testing with mixed topologies (2-3 weeks)

**See also:** - docs/AXI4\_AXIL4\_CONVERTER\_ANALYSIS.md - Complete analysis and effort estimates

## **TASK-016: Async Clock Domain Crossing**

**Priority:** P3 **Description:** Add optional async FIFO for APB and AXI on different clock domains.

## **TASK-017: Quality of Service (QoS) Support with Aging Counters**

**Priority:** P1 **Description:** Add configurable QoS (Quality of Service) support with aging counters to prevent starvation. **Rationale:** - Some masters require higher priority (CPU) vs others (debug) - Proportional bandwidth allocation (DMA gets 4x more than debug) - **CRITICAL: Must prevent low-priority masters from starving indefinitely** - Aging counters boost low-QoS request priority over time

to guarantee forward progress **Prerequisites:** - TASK-011 must be complete (use arbiter\_round\_robin.sv first) **Implementation:** - Replace arbiter\_round\_robin with arbiter\_round\_robin\_weighted - Add QoS weight inputs per master (configurable via parameters or runtime) - **Add aging counter at each master input (alongside AXI skid buffer wrappers)** - Counter increments each cycle while request is pending but not granted - When counter reaches threshold, boost QoS to maximum priority - Reset counter on grant - Configurable aging threshold (e.g., 256 cycles =  $\sim 2.5\mu s$  at 100MHz) - Update Bridge CSV generator to support QoS weight configuration - Test with mixed-priority traffic patterns AND starvation scenarios **QoS Weight Examples:** - High priority (CPU): base\_qos = 8 - Medium priority (DMA): base\_qos = 4 - Low priority (debug): base\_qos = 1 **Aging Logic:**

```

// Per-master aging counter
logic [7:0] aging_counter[NUM_MASTERS];
logic [3:0] effective_qos[NUM_MASTERS];

always_ff @(posedge aclk) begin
    for (int m = 0; m < NUM_MASTERS; m++) begin
        if (!aresetn) begin
            aging_counter[m] <= 8'b0;
        end else if (grant_valid && grant[m]) begin
            aging_counter[m] <= 8'b0; // Reset on grant
        end else if (request[m]) begin
            if (aging_counter[m] < AGING_THRESHOLD) begin
                aging_counter[m] <= aging_counter[m] + 1'b1;
            end
        end
    end
end

// Boost QoS when aged
always_comb begin
    for (int m = 0; m < NUM_MASTERS; m++) begin
        if (aging_counter[m] >= AGING_THRESHOLD) begin
            effective_qos[m] = MAX_QOS; // Boost to highest priority
        end else begin
            effective_qos[m] = base_qos[m]; // Use configured QoS
        end
    end
end

```

### Configuration Parameters:

|                                                   |                           |
|---------------------------------------------------|---------------------------|
| enable_aging: bool = True<br><i>(recommended)</i> | # Enable aging counters   |
| aging_threshold: int = 256                        | # Cycles before boost     |
| aging_counter_width: int = 8                      | # Counter width (max 255) |

```

cycles)
base_qos_weights: Dict[str, int] = {           # Per-master base QoS
    'cpu_master': 8,
    'dma_master': 4,
    'debug_master': 1
}

```

**Impact:** - Enables deterministic bandwidth allocation - **Guarantees forward progress for all masters (no starvation)** - Critical for real-time systems and mixed-criticality SoCs - Area increase: ~10 FFs per master (aging counter) + QoS arbiter logic **Test Requirements:** - Verify low-QoS master eventually gets serviced under heavy high-QoS load - Measure worst-case latency for low-priority requests - Confirm aging boost works correctly **See Also:** - BRIDGE\_QOS\_STARVATION\_PREVENTION.md (detailed design - to be created)

### TASK-018: CAM-Based Response Routing (Optional Enhancement)

**Priority:** P3 **Description:** Add optional CAM-based response routing for complex transaction scenarios. **Current Implementation:** - Simple RAM-based ID lookup (ID → Master mapping) - Assumes unique IDs for concurrent transactions - Works for 95% of use cases **CAM Enhancement Scenarios:** - Same ID outstanding to multiple slaves simultaneously - Need (ID, Slave) → Master composite key lookup - Complex transaction interleaving with aggressive ID reuse **Implementation Options:** 1. Extend existing cam\_tag.sv to cam\_lookup.sv with data storage 2. Add parameterizable USE\_CAM option (0=RAM, 1=CAM) 3. Document trade-offs (area vs flexibility) **CAM Lookup Module Features:** - Composite key: {transaction\_id, slave\_index} - Data: master\_index - Operations: insert, lookup, free - Parallel associative search **Trade-offs:** - RAM: Simple, fast, small area, assumes unique IDs - CAM: Flexible, handles ID reuse, 2-3x area increase **See Also:** - rtl/common/cam\_tag.sv - Existing CAM for tag presence - Need to extend for (key→data) mapping

### TASK-019: Pipeline FIFOs for High-Performance Crossbars

**Priority:** P3 **Description:** Add configurable pipeline FIFO stages for maximum throughput and timing closure. **Rationale:** - Current: Direct arbitration → mux → connection - High-performance: Arbitration → FIFO → Mux → FIFO → Connection - Benefits: Better timing closure (+20-50% fmax), higher throughput (100%), burst pipelining **Use Cases:** - High clock frequency targets (> 500 MHz) - Large crossbars (8x8, 16x16) with long routing paths - Burst-heavy traffic patterns - Timing closure challenges in complex SoCs **Implementation:** - Add post-arbitrer FIFO (shallow, depth=2, timing closure) - Add post-mux FIFO (deeper, depth=4,

throughput) - Use gaxi\_fifo\_sync from rtl/amba/gaxi/ - Configurable via BridgeConfig parameters **Configuration Options**:

```
pipeline_post_arbiter: bool = False      # FIFO after arbitration
pipeline_post_mux: bool = False          # FIFO after muxing
pipeline_depth_arb: int = 2              # Shallow for timing
pipeline_depth_mux: int = 4              # Deeper for throughput
```

**Trade-offs:** - Latency: +2-4 cycles per FIFO stage - Throughput: Up to 100% (eliminates combinational stalls) - Clock Frequency: +20-50% (breaks long combinational paths) - Area: +10-20% (FIFO resources) **Recommendation:** - Default: OFF (minimal latency for most use cases) - Enable: For high-performance designs or timing closure issues **See Also:** - BRIDGE\_REFATOR\_PLAN.md - Phase 3 detailed architecture - rtl/amba/gaxi/gaxi\_fifo\_sync.sv - FIFO implementation

## TASK-020: Optional Monitor Integration for Observability

**Priority:** P2 **Description:** Add optional integration of AXI/APB monitor components for rich performance tracking, debugging, error detection, and interrupt monitoring. **Rationale:** - Bridge is a critical crossbar component with complex transaction routing - Monitors provide visibility into: errors, performance bottlenecks, protocol violations, transaction flow - Debug-time observability without impacting production performance (monitors can be disabled)

**Use Cases:** - **Performance Analysis:** Track latency, throughput, bottlenecks per master/slave - **Debug:** Identify protocol violations, stalled transactions, timeout conditions - **Error Detection:** Catch SLVERR, DECERR, orphaned transactions - **Interrupt Monitoring:** Track error conditions that should trigger system interrupts **Available Monitor Components:** -

axi4\_master\_rd\_mon.sv, axi4\_master\_wr\_mon.sv - Master-side monitoring - axi4\_slave\_rd\_mon.sv, axi4\_slave\_wr\_mon.sv - Slave-side monitoring - apb\_monitor.sv - APB protocol monitoring (if Bridge has APB converters)

**Implementation Options:** 1. **Per-Port Monitors:** Optional monitor per master/slave interface 2. **Internal Crossbar Monitors:** Monitor internal arbitration points 3. **Aggregated Monitor Bus:** Collect all monitor packets to single FIFO output **Configuration:**

```
# Add to BridgeConfig
enable_monitoring: bool = False           # Global monitor enable
monitor_masters: List[int] = []           # Which masters to monitor
monitor_slaves: List[int] = []             # Which slaves to monitor
monitor_packet_types: Dict = {
    enable
    'error': True,
    'completion': False,  # High traffic
```

```

        'timeout': True,
        'performance': False # Enable separately for perf analysis
    }

```

## Integration Pattern:

```

// Per-master monitor (optional)
if (ENABLE_MONITORING && MONITOR_MASTERS[m]) begin
    axi4_master_wr_mon #(...) u_m${m}_wr_mon (
        // Connect to master interface signals
        .monbus_pkt_valid(mon_master_wr_valid[m]),
        .monbus_pkt_data(mon_master_wr_data[m]),
        .cfg_error_enable(cfg_mon_error_enable),
        .cfg_timeout_enable(cfg_mon_timeout_enable)
    );
end

// Aggregate monitor packets to single output
arbiter_rr_monbus #(N(NUM_MONITORS)) u_mon_arbiter (
    .i_valid({mon_master_wr_valid, mon_slave_rd_valid, ...}),
    .i_data({mon_master_wr_data, mon_slave_rd_data, ...}),
    .o_valid(bridge_monbus_valid),
    .o_data(bridge_monbus_data)
);

```

**Monitor Bus Output:** - Standard 64-bit monitor bus format - Per-packet type header (error, timeout, performance, etc.) - Can be routed to system-level monitoring infrastructure **Benefits:** - **Debug:** Catch issues during integration testing - **Performance Tuning:** Identify bottlenecks, optimize QoS weights - **Production Visibility:** Optional runtime monitoring (can disable for area/power) - **Standard Interface:** Compatible with existing rtl/amba/ monitor ecosystem **Trade-offs:** - Area: +5-10% per monitored interface (depending on enabled packet types) - Minimal performance impact (monitors are passive observers) - Can be completely compiled out if not used (via parameters) **Recommendation:** - Default: OFF (production configs) - Enable: For development, debug, performance analysis builds **See Also:** - rtl/amba/axi4/\*\_mon\*.sv - AXI4 monitor modules - rtl/amba/apb/apb\_monitor.sv - APB monitor - docs/AXI\_Monitor\_Configuration\_Guide.md - Monitor configuration best practices - rtl/amba/shared/arbiter\_rr\_monbus.sv - Monitor bus arbiter

---

## TASK-021: Testbench and Test File Automation

**Status:** Planned **Priority:** P2 **Effort:** 3-4 weeks **Owner:** Unassigned

**Description:** Automate generation of testbench and test files for bridge variants to reduce manual test creation effort.

**Current State:** - RTL Generation: Automated (bridge\_generator.py, bridge\_csv\_generator.py) - Test Parameter Generation: Partially automated (generate\_test\_params() functions) - Test File Creation: Manual (each test hand-written) - Testbench Classes: Manual (each TB class hand-written)

**Proposed Automation:** 1. **Test Template Generator:** Create templates for common test patterns 2. **TB Class Generator:** Auto-generate TB classes from bridge CSV configs 3. **Parameter Sweep:** Auto-generate parameter combinations for pytest 4. **Test Discovery:** Auto-discover and register generated tests

**Acceptance Criteria:** - [ ] Create test\_generator.py tool - [ ] Generate basic routing tests from CSV config - [ ] Generate OOO tests for appropriate slaves - [ ] Generate width converter integration tests - [ ] Auto-generate TB class from ports.csv - [ ] Integrate with existing pytest framework - [ ] Documentation for extending generated tests

**Related Files:** - bin/test\_generator.py (to be created) - dv/tests/templates/ (test templates directory) - dv/tbclasses/templates/ (TB class templates)

**Rationale:** Currently, adding a new bridge variant requires manually creating: 1. CSV configuration (automated input) 2. RTL generation (automated) 3. Test file (manual) 4. TB class customization (manual)

Steps 3-4 are repetitive and error-prone. Automating these would: - Reduce development time for new bridge variants - Ensure consistent test coverage across all variants - Eliminate copy-paste errors in test boilerplate - Allow focus on unique test scenarios vs boilerplate

**Implementation Notes:** - Leverage existing generate\_test\_params() patterns - Reuse BridgeAXI4FlatTB as base class - Generate tests for each master-slave combination in connectivity.csv - Support channel-specific tests (wr/rd/rw) - Include width converter test scenarios

---

## Task Dependencies

TASK-001 (Integration Examples) - Independent  
TASK-002 (Use Cases) - Independent  
TASK-003 (Performance) - Independent  
TASK-004 (Address Translation) - Independent  
TASK-005 (Error Handling) - Independent

TASK-006 (Wavedrom) - Independent  
TASK-007 (PlantUML) - Independent  
TASK-008 (Block Diagrams) - Independent  
TASK-009 (Configuration Guide) - Independent  
TASK-010 (Burst Analysis) - Independent

---

## Quick Commands

### Run Specification to PDF (when spec exists)

```
python bin/md_to_docx.py \
    projects/components/bridge/docs/bridge_spec/bridge_index.md \
    -o projects/components/bridge/docs/Bridge_Specification_v1.0.docx
\
    --toc --title-page --pdf
```

### Run Bridge Tests

```
# Run all bridge tests
pytest val/bridge/ -v

# Run APB-to-AXI tests
pytest val/bridge/test_apb_to_axi_bridge.py -v

# Run AXI-to-APB tests
pytest val/bridge/test_axi_to_apb_bridge.py -v
```

### Lint Bridge RTL

```
verilator --lint-only rtl/bridge/apb_to_axi_bridge.sv
verilator --lint-only rtl/bridge/axi_to_apb_bridge.sv
```

### Generate Wavedrom SVG

```
wavedrom-cli -i docs/bridge_spec/assets/waves/apb_to_axi_read.json \
    -s docs/bridge_spec/assets/waves/apb_to_axi_read.svg
```

### Generate PlantUML PNG

```
plantuml docs/bridge_spec/assets/puml/apb_to_axi_bridge_fsm.puml
```

### Measure Resource Usage (Vivado)

```
# Synthesize with Vivado and report utilization
vivado -mode batch -source scripts/synth_bridge.tcl
```

---

## Common Use Cases

### Use Case 1: RAPIDS-Style Accelerator Interconnect

**Scenario:** High-performance accelerator with dedicated write and read masters accessing shared DDR memory and APB control registers. **Solution:** Use **CSV bridge generator** with channel-specific masters:

```
rapids_descr_wr,master,axi4,wr,rapids_descr_m_axi_,512,64,8,N/A,N/A  
rapids_sink_wr,master,axi4,wr,rapids_sink_m_axi_,512,64,8,N/A,N/A  
rapids_src_rd,master,axi4,rd,rapids_src_m_axi_,512,64,8,N/A,N/A  
cpu_master,master,axi4,rw,cpu_m_axi_,64,32,4,N/A,N/A  
ddr_controller,slave,axi4,rw,ddr_s_axi_,512,64,8,0x80000000,0x80000000  
apb_periph0,slave,apb,rw,apb0_,32,32,N/A,0x00000000,0x00010000
```

**Benefits:** 35% overall resource reduction, custom prefixes, automatic width conversion

### Use Case 2: Multi-Processor SoC Interconnect

**Scenario:** Multiple CPU cores and DMA engines accessing shared memory and peripherals. **Solution:** Use **framework-based generator** for standard MxN topology:

```
python3 bridge_generator.py --masters 4 --slaves 3 --output ..//rtl/
```

**Benefits:** Array-indexed ports, proven architecture, hierarchical design

### Use Case 3: Mixed-Width Datapath

**Scenario:** 64-bit CPU and 512-bit DMA accessing 256-bit SRAM and 128-bit peripherals. **Solution:** Use **CSV bridge generator** with automatic width converter insertion: - Generator identifies width mismatches - Instantiates axi4\_dwidth\_converter\_wr and axi4\_dwidth\_converter\_rd - Handles upsize and downsize conversions **Benefits:** Automatic converter placement, no manual RTL editing

### Use Case 4: Protocol Bridging (Phase 3)

**Scenario:** AXI4 masters accessing legacy APB peripherals (UART, SPI, GPIO). **Solution:** Use **CSV bridge generator** with APB slaves (requires Phase 3): - Generator identifies protocol mismatch - Instantiates axi4\_to\_apb\_converter automatically - Maps AXI4 bursts to sequential APB transactions **Status:** Phase 3 pending (placeholders in place)

---

## Design Notes

### Crossbar Architecture

- **5-Channel Routing:** Complete AW, W, B, AR, R channel routing
- **Round-Robin Arbitration:** Per-slave fair arbitration with grant locking
- **ID-Based Responses:** Distributed RAM lookup tables for B/R routing
- **Out-of-Order Support:** Transaction ID tracking enables OOO responses
- **Burst-Aware:** Grant locked until WLAST/RLAST

### Performance Considerations

- **Latency:**
  - Direct path (no converter): 0-1 cycle routing latency
  - With width converter: +2-4 cycles per converter
  - With APB converter: +3-5 cycles (Phase 3)
- **Throughput:**
  - Full AXI4 burst throughput (no artificial stalls)
  - Width converters may limit throughput on narrow paths
  - Arbitration latency: 1-2 cycles per contention
- **Resource Usage:**
  - Minimal logic: MUXes + arbiters + ID tables
  - ID tables: Distributed RAM (LUT RAM on FPGAs)
  - Channel-specific masters save 35-60% ports

### CSV Generator Best Practices

- **Channel Selection:**
    - Use ‘wr’ for write-only masters (DMA write, descriptor writers)
    - Use ‘rd’ for read-only masters (DMA read, streaming sources)
    - Use ‘rw’ only when truly bidirectional (CPU, debug)
  - **Width Matching:**
    - Prefer native width matching when possible (avoid converters)
    - Group similar-width masters/slaves together
    - Document performance impact of width conversion
  - **Address Map:**
    - Use non-overlapping address ranges
    - Align base addresses to slave size
    - Document decode strategy
-

**Last Review:** 2025-10-29 **Next Review:** After Phase 3 completion or quarterly **See also:** - BRIDGE\_CURRENT\_STATE.md - Complete current state review - IMPLEMENTATION\_STATUS.md - High-level phase status - TESTING.md - Test compliance documentation