

# FE-I4 Firmware Development and Integration with FELIX for the Pixel Detector

Amitabh Yadav

August 11, 2017

## Abstract

CERN has planned a series of upgrades for the LHC. The last in this current series of planned upgrades is designated the HL-LHC and as the name suggests will bring the luminosity up to  $5 \times 10^{34} \text{ cm}^{-2} \text{ s}^{-1}$ . At the same time, the ATLAS Experiment will be extensively changed to meet the challenges of this upgrade (termed as the Phase-II upgrade). The Inner Detector will be completely rebuilt for the phase-II. The TRT, SCT and Pixel will be replaced by the all-silicon tracker, termed as the Inner Tracker (ITk). The read-out of this future ITk detector is an engineering challenge for the routing of services and quality of the data. This document describes the FPGA firmware development that integrates the GBT, Elink and Rx-Tx Cores for communication between the FE-I4 modules and the FELIX read-out system.

**Keywords:** ATLAS, Phase-II Upgrade, ITk, FE-I4, FPGA, IP-Bus, GBT-FPGA, Elink, Tx Core, Rx Core

## 1 Introduction

CERN was founded in 1952 under the french name, Conseil Européen pour la Recherche Nucléaire, in the Swiss canton of Geneva, across the French-Swiss border. CERN is the European high energy physics laboratory that uses the world's largest and most complex scientific instruments to study the basic constituents of matter.

The Large Hadron Collider (LHC) remains the latest addition to CERN's accelerator complex. First started up on 10 September 2008, the LHC consists of a 27-kilometer ring of superconducting electromagnets with a number of accelerating structures to boost the energy of the particles along the way. Inside the accelerator, two high-energy particle beams travel close to the speed of light in opposite directions in separate beam pipes kept at ultrahigh vacuum before they are made to collide at the detectors. The LHC has four detectors: ATLAS, CMS, ALICE and LHCb.



Figure 1: Artist representation of the ATLAS Experiment (left) and its inner detector (right).

The ATLAS Experiment is a general-purpose detector designed to exploit the full discovery potential of the LHC. ATLAS is about 45 meters long, more than 25 meters high and has an overall weight of approximately 7000 tonnes which makes it the largest detector. It is divided into sub-detectors as

show in Figure 1 (left). The Inner Detector represents the inner most part of ATLAS, surrounded by a solenoid magnet, the Calorimeters, the Muon system and a very large air-core toroid magnet. It is designed to work at high luminosity ( $10^{34} \text{cm}^{-2}\text{s}^{-1}$ ) with a bunch crossing every 25ns. Therefore, it is built with highly sophisticated technologies and specialized materials.

The inner detector, shown in Figure 1 (right) is the first part of ATLAS to see the results of the collisions, so it is very compact and sensitive. The Inner Detector measures the direction, momentum, and charge of electrically-charged particles produced in each proton-proton collision.

Long Shutdown 3 starting at the end of 2023 will include major performance upgrades of the accelerator for the high-luminosity phase (HL-LHC) which requires replacement of several major detector components (Phase-II). With a nominal (ultimate) luminosity of  $L = 5 \times 10^{34} \text{cm}^{-2}\text{s}^{-1}$  ( $L = 7.5 \times 10^{34} \text{cm}^{-2}\text{s}^{-1}$ ) and an average number of collisions per bunch crossing,  $\langle\mu\rangle = 140$  (200) inelastic proton-proton collisions per beam-crossing (pile-up), the HL-LHC will present an extremely challenging environment to the ATLAS experiment.



Figure 2: ITk 3D CAD model (left) and its visualization as implemented in the simulation (right)

The design of the ATLAS Upgrade inner tracker (ITk) has already been defined. It consists of several layers of silicon particle detectors. The innermost layers will be composed of silicon pixel sensors, and the outer layers will consist of silicon microstrip sensors. This contribution focuses on the strip region of the ITk. The central part of the strips tracker (the barrel) will be composed of rectangular short (approx. 2.5 cm) and long (approx. 5 cm) strip sensors. The forwards regions of the strips tracker (the endcaps) consist of 6 disks per side, with trapezoidal shaped microstrip sensors of various lengths and strip pitches. In response to the needs of the strip region for the ITk, highly modular structures are being studied and developed, called staves for the central region (barrel) and petals for the forward regions (end-caps). These structures integrate large numbers of sensors and readout electronics, with precision light weight mechanical elements and cooling structures. The silicon sensors are fabricated in n-in-p float zone (FZ) technology.

The FE-I4 is the Pixel detector used in the IBL and the candidate for the demonstrator of the ITk Pixel. It is an integrated circuit with readout circuitry for 26,880 hybrid pixels arranged in 80 columns on  $250\mu\text{m}$  pitch by 336 rows on  $50\mu\text{m}$  pitch. It is designed in a 130nm feature size bulk CMOS process. Sensors must be DC coupled to FE-I4 with negative charge collection. Each FE-I4 pixel contains an independent, free running amplification stage with adjustable shaping, followed by a discriminator with independently adjustable threshold. The chip keeps track of the firing time of each discriminator as well as the time over threshold (ToT) with 4-bit resolution, in counts of an externally supplied clock, nominally 40 MHz. Information from all discriminator firings is kept in the chip for a latency interval, programmable up to 255 cycles of the external clock. Within this latency interval, the information can be retrieved by supplying a trigger. The data output is serial over a current-balanced pair (similar to LVDS). The primary output mode is 8b/10b encoded with 160 Mb/s rate. The FE-I4 is controlled by a serial LVDS input synchronized by the external clock. No further I/O connections are required for regular operation, but several others are supported for testing.

The read-out chain that we are evaluating uses the FE-I4 and the FELIX. FELIX is the Front-End Link eXchange system that will interface the detector front-end electronics with the data collecting and processing components of the ATLAS experiment. FELIX will be built up with commodity network and standard technologies. We are designing the firmware for a Xilinx 7 Series FPGA to connect them both.

## 2 Methodology

The aim for the development of the firmware is to create a fast and reliable data read-out system for the ITk Stave demonstrator programme. This is implemented by interfacing the FE-I4 module with a commercial read-out electronics board, in our case, a Xilinx 7 Series FPGA board and integrating it with the read-out system for the ATLAS Phase-II upgrade (FELIX).

The complete system thus consists of 3 components (Figure 3): 1. The FE-I4 Pixel Module, 2. The Front End FPGA read-out board and 3. The Back End Electronics (FELIX).



Figure 3: Hardware Set-up to test Soft GBT Firmware

### 2.1 FPGA Description and VIVADO Design Suite

An FPGA is an integrated circuit designed to be configured by a customer or a designer after manufacturing, hence *field-programmable*. The FPGA configuration is generally specified using a hardware description language (HDL). FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be wired together, like many logic gates that can be inter-wired in different configurations. Logic blocks can be configured to perform complex combinational functions and sequential functions using the memory elements such as flip-flops. The advantage of FPGAs is the ability to re-program in the field to fix bugs, and may include a shorter time to market and lower non-recurring engineering costs.

Xilinx 7 series FPGAs comprise four FPGA families that address the complete range of system requirements, ranging from low cost, small form factor, cost-sensitive, high-volume applications to ultra high-end connectivity bandwidth, logic capacity and signal processing capability for the most demanding high-performance applications. The 7 series FPGAs include: (i) Spartan-7, (ii) Artix-7, (iii) Kintex-7 and (iv) Virtex-7. The features of 7 series FPGA are as follows:

1. Advanced high-performance FPGA logic based on real 6-input look-up table (LUT) technology configurable as distributed memory.
2. 36 Kb dual-port block RAM with built-in FIFO logic for on-chip data buffering.
3. High-performance SelectIO technology with support for DDR3 interfaces up to 1,866 Mb/s.
4. High-speed serial connectivity with built-in multi-gigabit transceivers from 600 Mb/s to maximum rates of 6.6 Gb/s up to 28.05 Gb/s, with special low-power mode, optimized for chip-to-chip interfaces.
5. Powerful clock management tiles (CMT), combining phase-locked loop (PLL) and mixed-mode clock manager (MMCM) blocks for high precision and low jitter.

The FPGA chosen for F/W implementation are Kintex 7/Virtex 7. A summary of its features are given below:

| Maximum Capability | Kintex 7                                         | Virtex 7                      |
|--------------------|--------------------------------------------------|-------------------------------|
| Logic Cells        | 478K                                             | 1955K                         |
| Block RAM          | 34Mb                                             | 68Mb                          |
| Transceivers       | 32                                               | 96                            |
| Transceiver Speed  | 12.5 Gb/s                                        | 28.05 Gb/s                    |
| Serial Bandwidth   | 800 Gb/s                                         | 2784 Gb/s                     |
| PCIe Interface     | x8 Gen2                                          | x8 Gen3                       |
| Memory Interface   | 1866 Mb/s                                        | 1866 Mb/s                     |
| I/O Pins           | 500                                              | 1200                          |
| I/O Voltage        | 1.2V3.3V                                         | 1.2V3.3V                      |
| Package Options    | Lidless Flip-Chip and High-Performance Flip-Chip | Highest Performance Flip-Chip |

Table 1: Summary of specifications of Xilinx 7 Series: Kintex 7 & Virtex 7

The Vivado Design Suite is an IDE for writing HDL for Xilinx Family of FPGA devices. It replaces the existing Xilinx ISE Design Suite of tools and helps accelerate design implementation with place and route tools that analytically optimize for multiple and concurrent design metrics, such as timing, congestion, total wire length, utilization and power. It provides design analysis capabilities at each design stage which allows for design and tool setting modifications earlier in the design processes.

Vivado is available for download from the Xilinx website and the licenses required to implement the code for the 7 Series FPGA are available from CERN through a License Server.

## 2.2 FE-I4

The FE-I4 modules considered are the single chip card and the DBM module (as shown in Figure ??fei4card)). They both have 3 pairs of Low Voltage Differential Signal (LVDS) pins, namely DATA, CMD and CLK (for Data output, Command input and Clocks, respectively).

The FE-I4B single chip card and the FE-I4B ATLAS Module is as shown in Figure 4.



Figure 4: FE-I4B single chip card (left) and FE-I4B DBM Module with FLEX connector (right)

## 2.3 FPGA firmware Setup

The overall firmware (FW) running on the front end electronics system consists of the following FW components (Figure 6):

1. **IP-Bus:** Uses Ethernet to communicate to PC. Monitors packets counts, resets etc by reading and writing to FPGA Registers. *Source: CMS Level-1 Trigger.*



Figure 5: Detailed Soft-GBT Firmware Architecture

2. **GBT**: GBT F/W to transmit data at 3.8-4.4 Gbps via Optical Link(SFP). It utilises GBT Bank. [1].
3. **RX Core**: Has a Decoder 8b/10b. Deserializer (8-bit serial data to parallel at 160 MHz clock). *Source: YARR and Xilinx.*
4. **TX Core**: Has no Encoder 8b/10b. Serializer (8-bit parallel data from E-Link bank to serial at 160 MHz clock). *Source: YARR and Xilinx.*
5. **E-Links**: Developed to receive commands from the GBT FW and send to Tx core to be Serialized. Receives deserialized data from Rx core and sends to GBT FW. *Source: Self Developed.*



Figure 6: Soft-GBT Firmware Block Diagram

The Front End Electronics read-out systems is set up to perform the following functions:

1. Using 10 FE-I4 Pixel Modules to acquire 8b/10b encoded data using serial communication through FMC.
2. Sending the commands (cmd) received through GBT FW to FE-I4 through Elinks.
3. Transmit GBT format data via optical SFP links from VC707 to KC709.
4. Control data and command lines through the Ethernet Port using IPBus Core.

## 2.4 Firmware Protocols

In order to communicate between various hardware peripherals, the described firmware derives their respective protocols. This section describes the protocol firmwares and their implementation in detail.

### 2.4.1 IPBus

IPBus communication goes through a single port (Ethernet, RJ-45) on the target. It utilizes a single 32-bit header at the front of every IPbus packet, which is necessary to ensure 100% reliable IPbus communication when using a potentially unreliable transport protocol such as UDP. Some of its features are, as follows:

1. Application Layer Protocol, highly flexible and ubiquitous.
2. Extensively-tested, tightly-integrated suite with Gigabit performance.
3. Easily scalable control system.
4. Applicable to any hardware with Ethernet interface.

The IP Bus control packet (Figure ??) is composed of:

Ethernet packet: 1,500 B

IP header: 20 B

UDP header: 8 B

IPbus packet: 1,472 B (or, 368 32-bit words)

The IPBus protocol is a reliable protocol for this firmware as it allows the flexibility to monitor communication parameters to and from the Optical link and the FE-I4 by calculating and designating the parameter values to the FPGA registers. These FPGA registers can be read directly at the front end for testing and debugging.

#### 2.4.2 GBT-FPGA

The GBT Protocol is used to implement multipurpose high speed (3.2-4.48 Gbps user bandwidth) bidirectional optical links for high-energy physics experiments.

Logically the link provides three distinct data paths for Timing and Trigger Control (TTC), Data Acquisition (DAQ) and Slow Control (SC) information. In practice, the three logical paths do not need to be physically separated and are merged on a single optical link as indicated in Figure 7. The aim of such architecture is to allow a single bidirectional link to be used simultaneously for data readout, trigger data, timing control distribution, and experiment slow control and monitoring. This link establishes a point-to-point, optical, bidirectional (two fibres), constant latency connection that can function with very high reliability in the harsh radiation environment typical of high energy physics experiments at LHC.



Figure 7: Link architecture with the GBT chip set and the Versatile Link opto-components

This GBT-FPGA core is now a full library, targeting FPGAs from ALTERA and XILINX, allowing the implementation of one or several GBT links of 2 different types: *Standard* or *Latency-Optimized* (providing low, fixed and deterministic latency either on Tx, Rx or on both). These links can be also configured to provide any encoding mode offered by the GBTx: the *GBT-Frame* mode (Reed-Solomon based) or the *Wide-Bus* mode (no encoding). (configuration is done through the GBT User Setup File). The *GBT-Frame*, shown in Figure 8, adopts the Reed-Solomon that can correct bursts of bit errors caused by Single Event Upsets (SEU). This encoding scheme can be used for Data Acquisition (DAQ), Timing Trigger & Control (TTC) and Experiment Control (EC).



Figure 8: The GBT encoding frame

The different components of the GBT-FPGA Core are integrated in a single module called *GBT Bank*. The GBT Bank may include several *GBT Links*. Each GBT Link is composed by a GBT Tx,

a GBT Rx (both together will be referred to as *GBT Logic*) and a Multi-Gigabit Transceiver (MGT). The clocking resources are external to the GBT Bank so the user can connect the different clocks as desired. The GBT Bank is the top module of the GBT-FPGA Core. That can integrates several GBT Links and the different ports required for the operation of the GBT Links.

A simplified block diagram of a GBT Bank instantiating two GBT Links is shown in Figure 9 (left).



Figure 9: The GBT Bank (left) and GBT Link (right)

The GBT Link is the actual channel of the link (Figure 9 (right)). It is composed by a GBT Tx (that scrambles and encodes the transmitted parallel data), a Multi-Gigabit Transceiver (MGT) (that serializes, transmits, receives and de-serializes the data) and a GBT Rx (that aligns, decodes and descrambles the incoming data stream).

#### 2.4.3 Elink with Tx & Rx Cores

The FE-I4 modules are designed for serial communication of data in 8b/10b encoded format (10 bits) and commands of 8bits. However, as shown in Figure 8, the GBT Bank is designed to send 84 bits of parallel data as the output. Therefore, in order to interface the FE-I4 modules, the need for deserialization of data (from FE-I4), its encoding and serialization of commands (to FE-I4) is required.

YARR (Yet Another Rapid Readout) is a read-out firmware that has two inbuilt sub-firmwares - Tx Core and Rx-Core. The read-out firmware uses the Tx and Rx cores after necessary modifications to send and receive data in the aforementioned format.

A brief summary of the function of the Tx and Rx cores, is as follows:

**Tx Core Operation:** - Receive 84-bits cmd from GBT (parallel)

- Re-routes the incoming 84-bits data to 10 different routes of 8-bit each.
- Uses 10 Tx core (YARR) to serialize the 8-bit cmd.
- OBUFFDS primitive converts the serial stream to a differential pair signal to send the cmd to FE-I4.



Figure 10: Tx Core

#### Rx Core Operation:

- IBUFGDS - convert the differential input to single signal.
- Receive FE-I4 Data (Serial Input).
- Uses 10 RX Core (YARR) to deserialize the incoming 8-bit data.
- Waits for a specified time to receive data from each FE-I4.

- Packing 4-bit slow control and 8-bit data of FE-I4 sequentially into an 84-bit GBT frame, with NULL where applicable.
- Send the 84-bit parallel data frame to GBT.



Figure 11: Rx Core

#### More Operations:

- Generates and provides necessary clocks for different components RX and TX Cores.
- Checks for no data cases.
- Sets data in FPGA register to count number of data packets sent and received, can be activated/deactivated using IPBus commands.



Figure 12: Clock Multiplier using MCMM and PLL

## 2.5 Elink Construction



Figure 13: Elink Structure

### 3 Conclusions and Future Work

Our first version of the firmware for the read-out of the ITk Stave demonstrator program has been conceived using existing firmware tools based on the YARR read out system and the GBT Core provided by CERN.

First, the IPBus for tri-mode ethernet was implemented and tested on Xilinx KC705 FPGA. The GBT F/W was then synthesized and implemented in Vivado. FPGA implementation can be done only by integrating with all the components. We then Extracted Rx Core and Tx Core from YARR for encoding and decoding. Several changes were made and cores modified accordingly. The main challenges with the RX and TX Cores arrised were black box instance errors: SERDES in Rx Core; cmp fifo in Tx-Core which were fixed by gathering more references on the project. Rx Core fixed and implemented. Tx Core is fixed and implemented. The RX Core needs modifications (unwanted components need to be removed) to simulate whereas, TX core was successfully simulated. Further, we completed clock multiplier using MMCM/PLL Xilinx IP Core Clock Wizard (uses 40MHz to generate 160MHz, 640MHz etc) which will be used in the design of the E-link to provide appropriate clocks to the components of the Rx core. E-Link Design for Tx segment is complete with differential signal cmd\_out to the FE-I4. The Rx segment is in progress.

The design and development of the RX-Segment E-link Bank FW for GBT and FE-I4 communication is in progress. The implementation requires connecting the clock multiplier segment with the rx core and integrating everything in the E-link Design.

The next step is connecting the E-Link design to FPGA Registers to set/reset, count data packets etc. Define and connect FPGA registers to IPBus to monitor communication parameters.

The final step would be integrating all the components together, performing logic Debugging and Test Timing synchronization by simulation for each component and defining netlist for Xilinx 7 Series FPGA for each component, append necessary changes and deploy to hardware.

## References

- [1] EP-ESE, “Gbt-fpga user guide ver1.40.” <https://espace.cern.ch/GBT-Project/GBT-FPGA>.

## Acknowledgements

I would like to express my deepest gratitude towards Dr. Carlos Solans Sanchez and Mr. Abhishek Sharma for giving me this opportunity to contribute towards the ATLAS experiment at CERN and for supervising me on my project throughout the period of my internship. I would like to thank my parents, family and friends who have always been there to support and guided me all the way through life. Finally, University of Petroleum and Energy Studies and my professors for guiding, motivating and making me capable of working at CERN.