

# Felix RD53 Readout Stress Test

CW 20

**PhD Status Report**  
**Matthias Drescher**

**Supervised by**  
**Prof. Dr. Arnulf Quadt**  
**Dr. Ali Skaf**

# Project recap.

- General project idea
- Instead of IpGBT/RD53A ASICs, create a readout chain with emulators
  - Can reach fully populated FELIX (24 links)
  - See if FELIX and components above are ready for larger systems



# Image of Setup



- Project status at time of last report
  - All (E-)Links locking
    - QSFP4 on the VCU128 not starting up reliably
  - Started preparing software for scan automation/performing tests
    - Necessary due to large number of frontends

```
lab34:~/Scripts$ flx-info link
Card type : FLX-712
Firmw type: PIXEL
```

#### Link alignment status

| Channel | 0   | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  |
|---------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| Aligned | YES |
| Channel | 12  | 13  | 14  | 15  | 16  | 17  | 18  | 19  | 20  | 21  | 22  | 23  |
| Aligned | YES |

```
lab34:~/Scripts$ █
```

```
E-link alignment status
-----
Endpoint 0 ('*'=aligned, '-'=not aligned)
LNK 0     8      16      24      32
0: *----*   *----*   *----*   *----*
1: *----*   *----*   *----*   *----*
2: *----*   *----*   *----*   *----*
3: *----*   *----*   *----*   *----*
4: *----*   *----*   *----*   *----*
5: *----*   *----*   *----*   *----*
6: *----*   *----*   *----*   *----*
7: *----*   *----*   *----*   *----*
8: *----*   *----*   *----*   *----*
9: *----*   *----*   *----*   *----*
10: *----*  *----*   *----*   *----*
11: *----*  *----*   *----*   *----*
-----
Endpoint 1 ('*'=aligned, '-'=not aligned)
LNK 0     8      16      24      32
0: *----*   *----*   *----*   *----*
1: *----*   *----*   *----*   *----*
2: *----*   *----*   *----*   *----*
3: *----*   *----*   *----*   *----*
4: *----*   *----*   *----*   *----*
5: *----*   *----*   *----*   *----*
6: *----*   *----*   *----*   *----*
7: *----*   *----*   *----*   *----*
8: *----*   *----*   *----*   *----*
9: *----*   *----*   *----*   *----*
10: *----*  *----*   *----*   *----*
11: *----*  *----*   *----*   *----*
-----
```

```
lab34:~/Scripts$ █
```

- Implemented scan automation and two kinds of test for thesis
- First test: Individually scan single frontends
  - In the visualization of the lost triggers
  - Pattern of device 0 and device 1, stays when links are swapped
  - The majority of frontends has >0 missing triggers for a given scan



# Measurements

- Second test: Scan multiple frontends at once, including more and more frontends
  - Number of failing frontends rises (better than expectations)
- Full system scan possible (but only for 8000 triggers)



- **QSFP4 fibers not brought up correctly after startup**
- **Suspected the different clock source**
  - **QSFP4 refclock driven from FPGA fabric, not from dedicated chip**
- **Solution: Hack the board to make the Si5328 output available (done by Ali)**



# Link Stability Improvement



- Options to continue
  - Keep it like this
  - Design PCB with additional Si5328 to use as a clock source
  - Design FMC card with QSFPs + additional Si5328s (→ run stress test on VCU128 only)
  - Buy Si5328 evaluation board (~200 Euro)



- **Old RD53A emulator configuration scheme**
  - Use DMA data stream broadcasted to all instances to configure emus. Write-only
  - Single-bit yes/no response through flag
- **New RD53A emulator configuration scheme**
  - Custom AXI slave as interface for each emulator
    - Emulator configuration memory mapped into processor address space
  - Switching system done with Xilinx ‘AXI Interconnect’ IP
  - Allows reading values for runtime statistics
- **Reworked system configuration in the same way**
  - No need to implement an Ethernet command for each functionality
  - Implemented trigger generator in the same way



```
(venv) lab34:/work1/mdrescher/stress_test_sw/apps$ ./sys-config.py kcu116
Opening board with IP 192.169.1.10 on port 7
Register hw_version          = 1    = 0x1
Register num_fes              = 1    = 0x1
Register num_lpgbts           = 1    = 0x1
Register lpgbt_cfg_override   = 0    = 0x0
Register lpgbt_cfg_write_enable = 0    = 0x0
Register lpgbt_cfg_address    = 0    = 0x0
Register lpqbt cfg data       = 0    = 0x0
Register mgt_RST_tertiary     = 0    = 0x0
Register mgt_RST_secondary    = 0    = 0x0
Register mgt_RST_primary      = 0    = 0x0
Register FEC12_not_FEC5       = 0    = 0x0
Register 10g_not_5g            = 1    = 0x1
Register sc i2c                = 0    = 0x0
Register bp_DL_interleaver    = 0    = 0x0
Register bp_DL_scrambler      = 0    = 0x0
Register bp_DL_fec             = 0    = 0x0
Register bp_UL_interleaver    = 0    = 0x0
Register bp_UL_scrambler      = 0    = 0x0
Register bp_UL_fec             = 0    = 0x0
Register fw_version            = 1    = 0x1
Register si5328 locking        = 1    = 0x1
Register tc_enable              = 0    = 0x0
Register tc_limited_count      = 0    = 0x0
Register tc_trigger_count      = 0    = 0x0
Register tc_wait_time           = 0    = 0x0
Register tc_rem_triggers        = 0    = 0x0
```

} Hardware information

} IpGBT configuration access

} MGT reset

} Board control (formerly VIO)

} Debug options (formerly VIO)

} Firmware information

} Trigger generator:  
Inject N/infinite number of evenly spaced triggers  
→ Useful for external trigger scans/debugging

```
(venv) lab34:/work1/mdrescher/stress_test_sw/apps$ ./fe-config.py kcull6 0 0
Opening board with IP 192.169.1.10 on port 7
Value of output_enable          for FE (0, 0) = 1      = 0x1 } Emulator output control
Value of cal_override            for FE (0, 0) = 0      = 0x0
Value of reset_hitgen           for FE (0, 0) = 0      = 0x0 } Hit data write control
Value of write_enable            for FE (0, 0) = 0      = 0x0
Value of BRAM address           for FE (0, 0) = 0      = 0x0 } TTC (Input) valid?
Value of TTC_status              for FE (0, 0) = 1      = 0x1
Value of stats_wait_idle         for FE (0, 0) = 0      = 0x0
Value of stats_start             for FE (0, 0) = 0      = 0x0 } Stats. measurement control
Value of stats_stop              for FE (0, 0) = 0      = 0x0
Value of stats_reset              for FE (0, 0) = 0      = 0x0
Value of stats_meas_duration     for FE (0, 0) = 0      = 0x0
Value of stats_triggers_remaining for FE (0, 0) = 0      = 0x0 } Stats. state machine status
Value of stats_measuring          for FE (0, 0) = 0      = 0x0
Value of stats_finished           for FE (0, 0) = 0      = 0x0
Value of stats_data_blocks_hi    for FE (0, 0) = 0      = 0x0 } Stats. results
Value of stats_data_blocks_lo    for FE (0, 0) = 0      = 0x0
Value of stats_total_blocks_hi   for FE (0, 0) = 0      = 0x0
Value of stats_total_blocks_lo   for FE (0, 0) = 0      = 0x0
Value of stats_sent_triggers     for FE (0, 0) = 0      = 0x0
Value of stats_N_ratio            for FE (0, 0) = 50     = 0x32
```

- Rewrote RD53A emulator tests using a proper HDL testing framework
  - Wanted to check whether RD53A emulator is responsible for missing triggers
  - Can provide stronger tests of the logic with smaller effort
- Testbench concept



- Previous testbenches were written in VHDL itself
  - Low level language → takes longer to write, harder to analyse response/generate stimulus  
→ Old tests only covered code as needed for development

- New testing framework: cocotb
  - Enables us to write tests in Python
    - Stimulus and response can be implemented as parametrized functions
    - Can read/write to file, generate random numbers
    - Can include stress test software library to generate test events
  - Integration into pytest: automated running of all tests
- Downside: not able to work with Xilinx simulator (use GHDL currently)
  - Removed Xilinx IPs from project (hit data BRAM now outside of emu.)



- Status: rewrote all tests and have working emulator again (without IPs)

```
matthias@matthias-UX410UQK:~/VivadoProjects/tiny_rd53a_emu$ pytest test_all.py
===== test session starts =====
platform linux -- Python 3.8.10, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/matthias/VivadoProjects/tiny_rd53a_emu
plugins: cocotb-test-0.2.4
collected 10 items

test_all.py ......

===== 10 passed in 8.92s =====
```

- Tests are much stronger now (tests assert/reconstruct output)
- Did not find bug, but removed small idle gaps in output

Headers (blue)  
and data (purple)  
are now tightly  
packed



```
2781000000000ada
2781000000000ada
2781000000000ada
21e0400002000026
10220002802100027
10240802a02300029
10260802c0250802b
10281002e0270802d
17120ff6f0291002f
1817dff674a1ff6f
1ad60fff69067f6ff
102b1003102a10030
102d1803302c18032
102f1803502e18034
21e00000000000000
2781000000000ada
2781000000000ada
```

- Sadly, had no effect on the lost trigger problem

- Create stable release with prebuilt bitstreams and a central documentation page
  - Can use as reference during development
  - Allows others to try out/use the project without hassle or needing dev. knowledge
- Debug missing triggers seen in YARR
  - Upgraded to new YARR version
    - Different results based on logging enable/disable, changing sleep times, changing trigger loop frequency
    - Upgrade to NetIO-next/felixstar (YARR adapter done by Wael)  
→ Upgrade also firmware, perform more tests, look inside different components to see where triggers get lost
  - Implement ITkPixV2 emulation
    - Read manual again and started planning the structure
    - Needs some R&D on stream processing before starting actual implementation
    - Can move 4 IpGBTs to ZCU106 (using 4-SFP FMC) if more space is needed (KCU116 @~50%)
  - Project shown at first ITk Online Software meeting

- **Summary**
  - Performed some first system tests for master thesis
  - Solved QSFP4 start up issues
  - New configuration scheme
    - (= 3 new custom IPs, AXI bus wiring, new reg.maps, new Ethernet protocol, adapted firmware/software)
- **Outlook**
  - Create documentation web page
  - Debug missing triggers
  - Implement RD53B/ITkPix emulation
- **Other**
  - Need QT
  - Waiting to register in GAUSS

  
**THANX**  
for your attention!

