

# **Traineeships in Advanced Computing for High Energy Physics (TAC-HEP)**

**GPU & FPGA module training: Part-2**

**Week-6: LHC, CMS Level-1 Trigger, Project**

**Lecture-11: April 25<sup>th</sup> 2023**



Varun Sharma

University of Wisconsin – Madison, USA



**WISCONSIN**  
UNIVERSITY OF WISCONSIN-MADISON

# So Far...



- **FPGA and its architecture**
  - Register/Flip-Flops, LUTs/Logic Cells, DSP, BRAMs
  - Clock Frequency, Latency
  - Extracting control logic & Implementing I/O ports
- **Parallelism in FPGA**
  - Scheduling, Pipelining, DataFlow
- **Vivado HLS**
  - Introduction, Setup, Hands-on for GUI/CLI, Introduction to Pragmas
  - Different Pragmas and their effects on performance
  - Practices to follow while writing HLS code – do's & don'ts

## Today:

- LHC, CMS Experiment
- Level-1 Trigger
- Project



TAC-HEP 2023

# LHC, CMS Experiment

# Large Hadron Collider



The LHC accelerates bunches of millions of protons (or ions) from 450 GeV injection energy from SPS to 6.8 TeV and collides them at **13.6 TeV** centre-of-mass energy

LHC circumference is 27 km and the minimal distance between bunches is  $25 \text{ ns} * c$

- Revolution frequency of LHC is 11.24 kHz
- Bunch crossing rate (ZeroBias rate) depends on number of bunches in the machine
- E.g. For 2380 colliding bunches (2023)
  - ZeroBias rate = 26.8 MHz



# CMS Detector



The LHC collides bunches of protons at **40 MHz\***

We **can't** readout all collisions (Zero suppressed data would be  $\sim 30\text{TB/s}$ )

**Primary requirement of the CMS detector is to store all interesting events (data)**



# CMS Trigger



CMS cannot readout the full raw detector data (RAW) in all bunch crossings ( $\sim 40$  MHz)

**CMS Trigger:** Reduces the event rate from the LHC collision rate to what can be stored and analysed offline while keeping the physics reach of the experiment

## Two-level triggering:

- **Level-1 Trigger (L1T)**
  - Rate down to  $\sim 100$  kHz
  - Using physics criteria and only a reduced event information (no tracker, reduced ECAL resolution)
  
- **High Level Trigger (HLT)**
  - Rate down to  $\sim 1$  kHz
  - Performs a simplified event reconstruction using full RAW event



# CMS Level-1 Trigger



- Collision data are buffered locally for  $< 4\mu\text{s}$

- Level-1 trigger receives data with coarse granularity from

- Calorimeters (ECAL, HCAL, HF)
- Muon systems (CSC, DT, RPC, GEM)

- It is implemented in hardware

- Mostly uses field programmable gate arrays (FPGAs)
- Operates synchronously to the LHC clock (40 MHz)

- Detector readout possible at  $< 100 \text{ kHz}$



# Current level-1 Trigger Design



# Level-1 Calorimeter Trigger



- **Two-layer calorimeter trigger**

- Tower-level calibrations
- Pileup subtraction
- Independent calibrations for jets, taus, e/gamma

- **Calo Layer – 1**

- Combines inputs from ECAL & HCAL
- Apply calibrations

- **Calo Layer – 2**

- Find Physics candidates: Jets, taus, & e/gamma
- For each object, applies pileup subtraction, computes isolation, applies object-based calibrations
- Computes global quantities: transverse energy, missing energy, etc.





TAC-HEP 2023

# Upgrade of Level-1 Trigger

## High Luminosity LHC

Latency:  $4\mu\text{s} \Rightarrow 12.5\mu\text{s}$

Output:  $100\text{kHz} \Rightarrow 750\text{ kHz}$

# Phase-2 Level-1 Trigger



# Phase-2 Calorimeter Level-1 Trigger



## INPUTS:

- ECAL crystals
- HCAL towers



# Phase-2 Calorimeter Level-1 Trigger



## Outputs (RCT)

- Clustered energy
- Unclusterd energy



# Barrel Calorimeter Granularity



**Rotated 90 degrees**



$\eta$ : Pseudo rapidity

$\varphi$ : Azimuthal angle

Calorimeter:  $2 \times (17\eta \times 72\varphi)$

$\varphi$



LHC

# Pseudo-rapidity



**Pseudo rapidity ( $\eta$ ):** Spatial coordinate describing the angle of a particle relative to the beam axis

$$\eta \equiv -\ln \left[ \tan \left( \frac{\theta}{2} \right) \right]$$

Where  $\theta$  is the angle between the particle three-momentum and the positive direction of the beam axis



Fig

# Barrel Calorimeter Segmentation



# Level-1 Calo Trigger Design



- Tiled multi-layer architecture
- Regional Calorimeter Trigger (RCT): Regional Layer partitions the detector and forms regional clusters
  - No sharing between RCT cards
  - All algorithms are regional
  - Regional sums are combined at GCT
- Global Calorimeter Trigger (GCT): Global Layer stitches neighbouring clusters and forms detector-wide triggerable objects (electrons, jets, taus, ET, HT, ETmiss etc.)
  - No sharing on input data between GCT cards
  - All calculations are done in each card



TAC-HEP 2023

# Regional Calorimeter Trigger

## RCT

# RCT: Overview



Each RCT card covers a geometry of  $17\eta \times 4\varphi$  and receives crystal energy as input from ECAL and tower energy from HCAL



## Input

- $17 \times 4 \times 5 \times 5$  ECAL crystal energies
- $17 \times 4$  HCAL tower energies

|                     |                            |
|---------------------|----------------------------|
| ECAL crystals = 16b | 10 ET + 5 timing + 1 Spike |
|---------------------|----------------------------|

|                   |                        |
|-------------------|------------------------|
| HCAL towers = 16b | 10 ET + 6 feature bits |
|-------------------|------------------------|

## Functionality

- Make EG towers/clusters
- ECAL + HCAL
- HoE
- ...

## Output

- $17 \times 4$  towers
  - Tower  $E_T$ , Cluster  $E_T$
  - Seed Eta, Phi, Time, Isolation
  - HoE

# RCT: Algorithm



**Input:** 17x4x5x5 ECAL crystals and 16x4 HCAL towers.

- o 1-RCT card covers  $17\eta \times 4\varphi$  towers

1. Divide card in regions of  $3\eta \times 4\varphi$  towers to make clusters.
2. Building clusters in  $3\eta \times 4\varphi$  region:
  - Search for seed crystal  $> 1$  GeV
  - Make 3x5 clusters at crystal level
  - Select maximum of 5 highest ET cluster in  $3\eta \times 4\varphi$  region
3. Move to next 3x4 towers and then do the merging around the neighbors if cluster is at the boundary of the tower
4. For 1-RCT card, there are  $5 - (3\eta \times 4\varphi)$  regions = 30 clusters
5. Sort and send a maximum of 12 highest ET clusters
6. To these 12 highest ET clusters, if there is a HCAL tower behind the ECAL tower, HCAL ET is also added to the cluster.
- **Output:** 12 Clusters (ECAL + HCAL)



# RCT: Clustering



Make 5 clusters in  $1-2\eta \times 5\varphi$   
and  $5-3\eta \times 5\varphi$  regions



1. Building clusters in  $3\eta \times 4\varphi$  region:
  - Search for seed crystal  $> 1$  GeV
  - Make  $3 \times 5$  clusters at crystal level
  - Select maximum of 5 highest ET cluster in  $3\eta \times 4\varphi$  region

Look for seed crystal  
and then make a  $3 \times 5$   
cluster around it

# RCT: Example



July 17, 2020



TAC-HEP 2023

# Hardware (FPGA)

# Advanced Processor Demonstrator-1



**Xilinx FPGA:**  
xcvu9p-flgc2104-1-e

# VU9P (C2104) Bank Diagram



Super Logic Region (SLR)  
Crossing

# FPGA Floor Diagram



| MGT # | Note                               | L1T FW Link # |  |  | L1T FW Link # | Note                               | MGT # |
|-------|------------------------------------|---------------|--|--|---------------|------------------------------------|-------|
| XOY59 |                                    | 47            |  |  | 95            |                                    | XIY59 |
| XOY58 |                                    | 46            |  |  | 94            |                                    | XIY58 |
| XOY57 |                                    | 45            |  |  | 93            |                                    | XIY57 |
| XOY56 |                                    | 44            |  |  | 92            |                                    | XIY56 |
| XOY55 |                                    | 43            |  |  | 91            |                                    | XIY55 |
| XOY54 |                                    | 42            |  |  | 90            |                                    | XIY54 |
| XOY53 |                                    | 41            |  |  | 89            |                                    | XIY53 |
| XOY52 |                                    | 40            |  |  | 88            |                                    | XIY52 |
| XOY51 |                                    | 39            |  |  | 87            |                                    | XIY51 |
| XOY50 |                                    | 38            |  |  | 86            |                                    | XIY50 |
| XOY49 |                                    | 37            |  |  | 85            |                                    | XIY49 |
| XOY48 |                                    | 36            |  |  | 84            |                                    | XIY48 |
| XOY47 |                                    | 35            |  |  | 83            |                                    | XIY47 |
| XOY46 |                                    | 34            |  |  | 82            |                                    | XIY46 |
| XOY45 |                                    | 33            |  |  | 81            |                                    | XIY45 |
| XOY44 |                                    | 32            |  |  | 80            |                                    | XIY44 |
| XOY43 |                                    | 31            |  |  | 79            |                                    | XIY43 |
| XOY42 |                                    | 30            |  |  | 78            |                                    | XIY42 |
| XOY41 |                                    | 29            |  |  | 77            |                                    | XIY41 |
| XOY40 |                                    | 28            |  |  | 76            |                                    | XIY40 |
| XOY39 |                                    | 27            |  |  | 75            |                                    | XIY39 |
| XOY38 |                                    | 26            |  |  | 74            |                                    | XIY38 |
| XOY37 |                                    | 25            |  |  | 73            |                                    | XIY37 |
| XOY36 |                                    | 24            |  |  | 72            |                                    | XIY36 |
| XOY35 |                                    | 23            |  |  | 71            |                                    | XIY35 |
| XOY34 |                                    | 22            |  |  | 70            |                                    | XIY34 |
| XOY33 |                                    | 21            |  |  | 69            |                                    | XIY33 |
| XOY32 |                                    | 20            |  |  | 68            |                                    | XIY32 |
| XOY31 |                                    | -             |  |  | 67            |                                    | XIY31 |
| XOY30 |                                    | -             |  |  | 66            |                                    | XIY30 |
| XOY29 |                                    | -             |  |  | 65            |                                    | XIY29 |
| XOY28 |                                    | -             |  |  | 64            |                                    | XIY28 |
| XOY27 |                                    | 19            |  |  | 63            |                                    | XIY27 |
| XOY26 |                                    | 18            |  |  | 62            |                                    | XIY26 |
| XOY25 |                                    | 17            |  |  | 61            |                                    | XIY25 |
| XOY24 |                                    | 16            |  |  | 60            |                                    | XIY24 |
| XOY23 |                                    | 15            |  |  | 59            |                                    | XIY23 |
| XOY22 |                                    | 14            |  |  | 58            |                                    | XIY22 |
| XOY21 |                                    | 13            |  |  | 57            |                                    | XIY21 |
| XOY20 |                                    | 12            |  |  | 56            |                                    | XIY20 |
| N/A   | not bonded<br>out at chip<br>level | -             |  |  | -             | not bonded<br>out at chip<br>level | N/A   |
| N/A   |                                    | -             |  |  | -             |                                    | N/A   |
| N/A   |                                    | -             |  |  | -             |                                    | N/A   |
| N/A   |                                    | -             |  |  | -             |                                    | N/A   |
| XOY15 |                                    | 11            |  |  | 55            |                                    | XIY15 |
| XOY14 |                                    | 10            |  |  | 54            |                                    | XIY14 |
| XOY13 |                                    | 9             |  |  | 53            |                                    | XIY13 |
| XOY12 |                                    | 8             |  |  | 52            |                                    | XIY12 |
| XOY11 |                                    | 7             |  |  | 51            |                                    | XIY11 |
| XOY10 |                                    | 6             |  |  | 50            |                                    | XIY10 |
| XOY9  |                                    | 5             |  |  | 49            |                                    | XIY9  |
| XOY8  |                                    | 4             |  |  | 48            |                                    | XIY8  |
| XOY7  |                                    | 3             |  |  | -             | Reserved for<br>DAQ                | XIY7  |
| XOY6  |                                    | 2             |  |  | -             |                                    | XIY6  |
| XOY5  |                                    | 1             |  |  | -             |                                    | XIY5  |
| XOY4  |                                    | 0             |  |  | -             | not bonded<br>out at chip<br>level | XIY4  |
| N/A   | not bonded<br>out at chip<br>level | -             |  |  | -             |                                    | N/A   |
| N/A   |                                    | -             |  |  | -             |                                    | N/A   |
| N/A   |                                    | -             |  |  | -             |                                    | N/A   |
| N/A   |                                    | -             |  |  | -             |                                    | N/A   |

# Some more information



## APx – Firmware Shell





TAC-HEP 2023

# Global Calorimeter Trigger

## GCT

# Barrel GCT



- Each GCT takes input from 6 unique RCT cards and 2 neighbouring cards in each phi
- 1 GCT card:  $(1 + 6 + 1) \times 2 = 16$  RCT regions

1 GCT card: 16 RCT cards: (in each eta half): 6 RCT + 1 (left boundary) + 1 (right boundary)



# Barrel GCT



Barrel GCT receives input from RCT

- Stitch together RCT regions in eta and phi direction
- Sends clustered energy post stitching to Correlator layer 1
- Make Physics objects – Egamma, Jets, Tau, MET (Ex, Ey), and pass onto Global Trigger

Separate projects are built for each step to produce different RTL's that will be merged together to make one built file with help from our engineers.



TAC-HEP 2023

# Project

# Project: Re-designing RCT



- Write Regional Calorimeter Trigger Algorithm with a segmentation of calorimeter and make a bit file (steps to make bit file to be discussed tomorrow)
- New RCT:  $17\eta \times 6\varphi$  (instead of  $17\eta \times 4\varphi$ )
- Total of 24 RCT cards needed instead of 36



**More tomorrow!**



TAC-HEP 2023

# Questions?



TAC-HEP 2023

# Acknowledgement

---

Lectures are compiled using content from Xilinx's public pages/examples or different user guides



TAC-HEP 2023

# *Additional material*

# Assignment submission



- Where to submit:
  - <https://pages.hep.wisc.edu/~varuns/assignments/TAC-HEP/>
- Use your login machine credentials
- Submit one file per week
- Try to submit by following week's Tuesday

# Correct Time



**From 03.28.2023 onwards**

- Tuesdays: 9:00-10:00 CT / 10:00-11:00 ET / 16:00-17:00 CET
- Wednesday: 11:00-12:00 CT / 12:00-13:00 ET / 18:00-19:00 CET

# Jargons



- **ICs - Integrated chip:** assembly of hundreds of millions of transistors on a minor chip
- **PCB:** Printed Circuit Board
- **LUT - Look Up Table aka 'logic'** - generic functions on small bitwidth inputs. Combine many to build the algorithm
- **FF - Flip Flops** - control the flow of data with the clock pulse. Used to build the pipeline and achieve high throughput
- **DSP - Digital Signal Processor** - performs multiplication and other arithmetic in the FPGA
- **BRAM - Block RAM** - hardened RAM resource. More efficient memories than using LUTs for more than a few elements
- **PCIe or PCI-E - Peripheral Component Interconnect Express:** is a serial expansion bus standard for connecting a computer to one or more peripheral devices
- **InfiniBand** is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency
- **HLS** - High Level Synthesis - compiler for C, C++, SystemC into FPGA IP cores
- **DRCs** - Design Rule Checks
- **HDL** - Hardware Description Language - low level language for describing circuits
- **RTL** - Register Transfer Level - the very low level description of the function and connection of logic gates
- **FIFO** – First In First Out memory
- **Latency** - time between starting processing and receiving the result
  - Measured in clock cycles or seconds
- **II - Initiation Interval** - time from accepting first input to accepting next input

# Assignment Week-3



- Use target device: **xc7k160tfbg484-2**
  - Clock period of 10ns
1. Execute the code (lec5Ex2.tcl) using CLI (slide-25) and compare the results with GUI results for C-Simulation, C-Synthesis
  2. Vary following parameters for two cases: high and very high values and compare with 1 for both CLI and GUI
    - Variable: "samples"
    - Variable: "N"
  3. Run example lec3Ex2a

# Assignment Week-4



1. Do a matrix multiplication of two 1-dimensional arrays -  
 $A[N]*B[N]$ , where  $N > 5$ 
  - a) Report synthesis results without any pragma directives
  - b) Add as many pragma directives possible
    - i. Report any conflicts (if reported in logs) between two pragmas
2. Compare the analysis perspective (Performance) for different case shared today
3. For Array\_partitioning, instead of using complete, use **block** and **cyclic** with different factors

# Assignment Week-5



1. Do exercise mention on slide-24
2. A matrix multiplication using two for loops and compare results for pragma loop\_flatten & unroll
3. Write a simple program doing arithmetic operations(+, -, \*, /, %) between two variable use of arbitrary precision to compare results between stand c/c++ data types and using ap\_(u)int<N>
4. Write a program using an array with N(=10/15/20) elements and then restructure the code with a struct having N-data member. Compare the results of two programs