

## DCP development and technical note

### A. Hardware design

- Input with an input signal adapter board is usually needed to extend the number of detectors from the PET that can be more than the number of FPGA input ports.
- Signal adapter (PCB) that streamlines input singles data into the FPGA I/O was developed. When the number of detectors is greater than the number of FPGA I/O, the signal adapter will merge the signals from multiple detectors by parsing them based on divided data bandwidth. For example, each singles event is <50 Mbps, and the LVDS cable data transmission rate is 400 – 1000 Mbps. Thus, singles events from multiple detectors can be packed, transmitted together through an FPGA I/O, unpacked, and distributed to different CPs.
- The components of a single CP and its functions was defined for general coincidence data acquisition and implemented. It was designed as a *core library function in DCP firmware programming*. A different version for each of the two main FPGA product lines made by manufacturers (Intel Corporation and Xilinx Company) can be adapted, so that it can be used with different FPGA architectures. The design was streamlined with the firmware code structure, resource use, and other advanced or practical features, such as runtime monitoring, and modular and parameterized code and user interface.
- The replication and distribution of N CPs, where N is the number of valid coincidence detector pairs to be defined by the specific implementation, were configured and analyzed with different FPGA resource use and routing strategies, with the objective of minimizing the resource use and routing congestion while still providing good DCP capability and performance for a large number of coincidence processors. *The tradeoffs among resource use, routing capability, and DCP capability/performance was used for practical implementation.*
- The overall hardware design targets the optimized functionality of a single CP, high efficiency of resource use, minimal routing congestion, and flexibility for expanding and improving, which provides the basis for improving DCP capability and performance and simplifying implementation and modification.
- This project provides a core DCP technology that was implemented and evaluated on our specific setup. Although we did demonstrate the DCP implementing with full-scale signal input and output, the open source, through this website, is limited to providing the DCP firmware code and supporting information. It is beyond the scope of this project and the corresponding DCP technology development to provide the specific input of singles event and output of coincidence events related to a specific PET scanner as those should be tied to the specific PET scanner design and specifications. The signal adapter board and inter-board signal interface developed in this project were solely developed for testing and demonstrating the functionality and performance of the DCP.

### B. Firmware design

- The firmware code was **modularized** and **parameterized** for better code improvement, maintenance, implementation, and modification. The following figure shows the modules for the singles data input, different DCP functions, and coincidence data output. The ports for parameterization will be embedded in the code through these modules:

| INPUT                                                                                                                                                                                           | SCATTER-STAGE                                                                                                                   | SELECT-STAGE                                                                                                                                                                                | GATHER-STAGE                                                                                                        | OUTPUT                                                                                                     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| <ul style="list-style-type: none"> <li>• Receiver</li> <li>• Event decoder</li> <li>• Singles buffer</li> <li>• Online-monitor interface</li> <li>• High-speed FPGA-FPGA transmitter</li> </ul> | <ul style="list-style-type: none"> <li>• Pair-list importer</li> <li>• Event replicator</li> <li>• Event distributor</li> </ul> | <ul style="list-style-type: none"> <li>• Single-pair processor</li> <li>• Processor replicator</li> <li>• Single-Pair output buffer</li> <li>• Online config / monitor interface</li> </ul> | <ul style="list-style-type: none"> <li>• Coincidence event collector</li> <li>• Online monitor interface</li> </ul> | <ul style="list-style-type: none"> <li>• Event packer</li> <li>• High-speed FPGA-PC transmitter</li> </ul> |

Modules of firmware code for DCP functions and Input/Output data process.

- A mechanism that parameterizes the timing (and other) information in the data vector of singles events permits a user to extract the singles' timing from any specific data format for DCP coincidence processing without needing in-depth knowledge of the source code.
- The FPGA resource use was defined according to the hardware design. A passive (automatic) resource allocation generated by a compiler could hit an early ceiling of resource occupation due to the excessive use of a particular resource type. To prevent this problem, we adjusted resource allocation as part of the iterative design-development process.
- Depending on the implementation, compiled FPGA programs usually vary in execution speed, input/output delay, resource use, etc. Different implementations for the core functions of a single CP were tested and compared for fine tuning the DCP performance. Logical behavior-level simulations of various modules were performed to check and compare runtime timing results and register transfer-level resource usage.
- Since each CP works independently and in parallel, the major design was focused on the workload distribution among different FPGAs, synchronization among FPGAs, and high-speed inter-FPGA data transmission.



### C. Implementation with a single FPGA board

- A commercially available FPGA board was used. Originally, Arria 10 development board (Intel Corporation) was planned for the implementation. However, during the pandemic period, Arria 10 was out of stock. Instead, an Intel Stratix 10 development board was alternatively used for the implementation which has enough capability to implement DCP with  $\geq 400$  CP's.
- An 8-ch input signal adapter (PC board) was developed for the testing and evaluation. The signals from detectors were transferred via LVDS cables to the adapter board and be processed by DCP. In principle, multiple 8-ch adapter boards can be stacked to accommodate more input channels as shown in the previous study [1], or to have an adapter board with more number of input channels.



### D. Implementation with two cascade S4 FPGA boards

- DCP with cascaded multi-boards was implemented and tested with two existing low-end FPGA boards (Stratix IV GX EP4SGX230), which will allow DCP to increase its capability to a large number of coincidence processors without being limited by the capability of a single FPGA.
- Each board was implemented with 50 CPs. This 50 CPs is only used in this project for a test. There is no limit to the number of CPs within the available FPGA resource.
- Comparing to the single board DCP, the major difference in DCP with multi-boards is the addition of inter-board singles data transmitting and receiving for pairing detectors over the boards.
- The same single board DCP was implemented for both S4 boards *plus* the extra *inter-board data communications*.



### E. Evaluation and results

- DCP basic functionality and performance – evaluated with internal simulated event trigger signals

As a very useful tool developed over the evaluation, an embedded FPGA program can generate “simulated” trigger pulses that mimic the digital output of detected single events. It is based on random number generators and can deliver the paired “pseudo” coincidence event signals with adjustable timing difference between them to any pair of chosen DCP input channels. Without

requiring external setup and signal input, it was used as a handy diagnosis and debug tool to test the very basic DCP functionality under the controllable, ideal signal detecting scenarios, and to check DCP errors and monitor logic processing.

The following screen shot shows two simulated single event trigger signals, with variable timing difference between a paired events. The range of timing difference between paired events can be defined by the setting parameters:



The following figure shows the distribution of inputted paired coincidence events with respect to all available 432 coincidence pairs. This uniform distribution shows that the internal simulated coincidence events can be generated and delivered as desired to any or all coincidence pairs to be investigated.



Two input digital event (singles event) trigger signals were paired and fed to DCP input channel A and B. The timings of each event trigger signals inputted to channel A or B were randomly distributed from 0 to 256 ns, and the timing difference between each paired two singles events was also randomly distributed from 0 to 256 ns.

The following figure (a) and (b) show the distributions of singles event timings *measured* from input channel A and B. As expected, besides small variations due to the limited randomness by the random number generator, both timing distributions are quite uniform.

Figure (c) shows a uniform distribution of timings from both channels A and B by joining the timings of paired singles event together. Figure (d) shows a triangular shaped distribution besides some small variations near the peak which were mainly due to quantization errors. These distributions show that both timings measured from channels A and B were also random, as expected, and there wasn't intrinsic correlation were observed between the measured paired event timings.



The following shows the measured event rate compared to the expected event rate.



The difference between the measured event rate and expected event rate (event rate error) is shown below. No error was measured over up to event rate at 10 M event/s.



The impact of DCP timing window to the coincidence event acquisition was studied by feeding the paired events to DCP with different windows of  $\pm 20$  ns,  $\pm 50$  ns, and  $\pm 100$  ns. The following plot shows that the timing difference distributions following the timing window as expected.



- DCP functionality and performance – evaluated with external digital event trigger signals

The following photos show the setup of DCP two different *external* signals were used to evaluate the DCP performance for the implementation with a single FPGA or two cascade FPGAs:

- Pulsed signals from a PCB based signal generator. They were used to assess the basic DCP functionality and performance, such as the count rate and error in coincidence event selection.
- Signals from real PET detectors' output with a Na-22 point source placed at the center of a prototype small animal PET. They were used to evaluate DCP performance and application for practical PET acquisition.



*Setup of evaluations with the single FPGA board (left) and two cascade FPGA boards (right). An additional external pulsed signal generator PC board was used to input pulsed signals to FPGA(s), or a prototype small animal PET was used to input signals from detectors to FPGA(s).*

The following figure shows that, with paired input signals to two fixed input channels, all coincidence events were measured with a fixed pair of CPs which was labeled at coincidence index number 3. The number of input and out coincidence were the same.



The following figures show the similar randomness of the input signals with the digital event trigger signals as those with the internal simulated event signals.



The following figures show that the measured event rate equals to the expected event rate.



- DCP performance evaluated with input signals from a PET scanner:

- With a single FPGA

A prototype small animal PET was used to evaluate the DCP performance based on its timing measurement. The PET has total 12 detectors and their output singles event trigger signals (digital timing signals) can be used to feed to the DCP inputs. More technical details of the PET are described in [2, 3].

The following shows that when only the detectors with number 1, 4, 7, and 10 were used in the study, the output signals from these detectors were fed into DCP and paired to the coincidence pair number 0, 3, 6, 81, 84, and 156 under the specific PET configuration for the nominal FOV size. When there was no positron emission source inside the PET, the *random events* caused by LYSO background radiation were recorded with 10 minutes PET acquisition and  $\pm 50$  ns timing window. As shown below, counts on each coincidence pair were about the same, and the full width at half maximum (FWHM) of the timing-difference distribution was measured as 31.61 ns. Due to the randomness of the random coincidence events, the coincidence timing distribution is expected to be close to a triangular shape.



The following is the similar pattern with random events measured from 10 detectors with another 10 minutes PET acquisition and  $\pm 50$  ns timing window. As expected, there were total 28 coincidence pairs measured. The measured FWHM of the timing difference distribution was 31.76 ns. During the study, detectors 11 and 12 had technical issues and were not used. However, this should not affect the evaluation of DCP performance.



When there was a Na-22 point source being placed at the center of field of view (CFOV), the output event trigger signals of detectors 1, 4, 7, and 10 were fed to the DCP. The following figures show that the counts with coincidence pairs 3 and 84 were substantially increased that corresponded to the only two expected pairs of detectors in coincidence from radiations of 511 KeV gamma rays. There were

small amount random events from LYSO background radiations. The measured FWHM of the coincidence timing distribution was 20.82 ns. The wide FWHM was due to the fact there was not any timing alignment and corrections to the timing determined by the simple lead-edge discrimination threshold. The DCP should not impact the coincidence time resolution as the sole function of DCP is to accept pairs of events with their timing difference within the timing window.



Still with the Na-22 point source being placed at the center of field of view (CFOV), the output event trigger signals of detectors 1, 2, 3, 4, 7, 8, 9, and 10 were fed to the DCP. The following figures show that the counts with four coincidence pairs were substantially increased that corresponded to the only four expected pairs of detectors in coincidence from radiations of 511 KeV gamma rays. The measured FWHM of the coincidence timing distribution was 23.59 ns, which was slight worse than the 20.82 ns from two detector pairs, mainly due to none timing alignment among these detectors.



□ With two FPGA's:

The following show the measured timing from coincidence events generated by the random events from detector background radiations. The PET acquisition time (10 min) and timing window ( $\pm 50$  ns) were the same as in the single FPGA study. Detector 1-5 were connected to one FPGA and 6-10 to the other FPGA. The measured FWHM of timing difference distribution was 32.40 ns which was close to the 31.76 ns measured with a single FPGA.



With the same Na-22 point source at the CFOV, the measured FWHM of timing difference distribution with four detector pairs was 24.47 ns which was close to the 23.59 ns measured with the single FPGA.



## F. Conclusion

Overall, above evaluations show that the DCP can provide appropriate functionality and performance for PET coincidence event acquisitions.

## G. Reference

1. Cheng, X., et al., *Field-programmable-gate-array-based distributed coincidence processor for high count-rate online positron emission tomography coincidence data acquisition*. Phys Med Biol, 2021. **66**(5): p. 055009.
2. Cheng, X., et al., *A compact and lightweight small animal PET with uniform high-resolution for onboard PET/CT image-guided preclinical radiation oncology research*. Phys Med Biol, 2021. **66**(21).
3. Cheng, X., et al., *Integrated Small Animal PET/CT/RT with Onboard PET/CT Image Guidance for Preclinical Radiation Oncology Research*. Tomography, 2023. **9**(2): p. 567-578.